Deep Learning Alongside Dynamic Computation Graphs (Iclr 2017)

This is a newspaper yesteryear Google that is nether submission to ICLR 2017. Here is the OpenReview link for the paper. The newspaper pdf every bit good every bit newspaper reviews are openly available there. What a concept!

This newspaper was of involvement to me because I wanted to acquire virtually dynamic computation graphs. Unfortunately almost all automobile learning/deep learning (ML/DL) frameworks operate on static computation graphs too can't handgrip dynamic computation graphs. (Dynet too Chainer are exceptions).

Using dynamic computation graphs allows dealing alongside recurrent neural networks (RNNs) better, amid other occupation cases. (Here is a non bad article virtually RNNs too LSTMs. Another goodness writeup on RNNs is here.) TensorFlow already supports RNNs, only yesteryear adding padding to ensure that all input information are of the same size, i.e., the maximum size inwards the dataset/domain. Even too so this back upwardly is goodness solely for linear RNNs non goodness for treeRNNs which is suitable for to a greater extent than advanced natural linguistic communication processing.

This was a really tough newspaper to read. It was definitely higher upwardly my score every bit a beginner. The newspaper assumed a lot of background from the reader. It assumed familiarity alongside TensorFlow execution too operators, too besides to a greater extent than or less agreement of programming linguistic communication background too familiarity alongside RNNs. The dynamic batching thought introduced inwards the newspaper is a complex thought only it is explained briefly (and perchance a flake poorly?) inwards ane page. Even when I gave the newspaper all my attention, too tried to cast several hypothesis of dynamic batching idea, I was unable to brand progress. At the end, I got help from a friend who is an practiced at deep learning.

I skipped reading the minute percentage of the newspaper which introduced a combinator library for NNs. The library is relevant because it was instrumental inwards implementing the dynamic batching thought introduced inwards the offset percentage of the paper. This minute percentage looked interesting only the functional programming linguistic communication concepts discussed was difficult for me to follow.

The dynamic batching idea

This newspaper introduces dynamic batching thought to emulate dynamic computation graphs (DCGs) of arbitrary shapes too sizes over TensorFlow which solely supports static computation graphs.

Batching is of import because GPUs crave for batching, peculiarly when dealing alongside text information where each item is of minor size. (While images are already large plenty to fill/busy the GPU, only that is non so for text data.)

However, the challenge for batching when using DCGs is that the graph of operations is non static, too tin last dissimilar for every input. The dynamic batching algorithm fixes batching for DCGs. Given a laid of computation graphs every bit input, each of which has a dissimilar size too topology, dynamic batching algorithm volition rewrite the graphs yesteryear batching together all instances of the same performance that occur at the same depth inwards the graph. (Google is actually into graph rewriting.)

The dynamic batching algorithm takes every bit input a batch of multiple input graphs too treats them every bit a unmarried disconnected graph. Source nodes are constant tensors, too non-source nodes are operations. Scheduling is performed using a greedy algorithm: (I omit to a greater extent than or less of the to a greater extent than detailed steps inwards the paper.)

Assign a depth, d, to each node inwards the graph. Nodes alongside no dependencies (constants) are assigned depth zero. Nodes alongside solely dependencies of depth zero, are assigned depth one, too so on.
Batch together all nodes invoking the same performance at the same depth into a unmarried node.
Concatenate all outputs which direct hold the same depth too tensor type. The social club of concatenation corresponds to the social club inwards which the dynamic batching operations were enumerated.

Each dynamic performance is instantiated ane time inwards the static dataflow graph. The inputs to each performance are tf.gather ops, too the outputs are fed into tf.concat ops. These TensorFlow ops are too so placed inside a tf.while_loop. Each iteration of the loop volition evaluate all of the operations at a particular depth. The loop maintains the world variables for each tensor type t, too feeds the output of concat for tensor type t too iteration d into the input of the gathers at tensor type t too iteration d+1.

Experimental results

The exam results emphasize the importance of batching, peculiarly on GPUs where it tin enable speed ups upwardly to 120x. The speedup ratio denotes the ratio betwixt the per-tree fourth dimension for dynamic batching on random shapes ("full dynamic"), versus manual batching alongside a batch size of 1.

Dynamic batching instantiates each performance solely once, too invokes it ane time for each depth, so the break of heart invocations is log(n), rather than n, where n is tree size. Dynamic batching hence achieves substantial speedups fifty-fifty at batch size 1, because it batches operations at the same depth inside a unmarried tree.

Limitations

Dynamic batching industrial plant on a unmarried machine, it is non distributed. Dynamic batching requires an all to all broadcasts, so it doesn't scale to distributed machines.

This Google newspaper doesn't elevate or speak virtually Dynet too Chainer, only Dynet too Chainer are unmarried automobile ML/DL frameworks that back upwardly dynamic computation graphs. On ane hand, Dynet & Chainer are most probable non goodness at batching, too the dynamic batching method hither has contribution. On the other hand, since Dynet & Chainer back upwardly dynamic computation graphs natively (rather than yesteryear agency of emulating it on static computation graphs similar dynamic batching does), they are most probable to a greater extent than expressive than the dynamic batching tin achieve. In fact, to a greater extent than or less other limitation of the dynamic batching approach is that it requires all operations that mightiness last used to last specified inwards advance. Each input/output may direct hold a dissimilar type only all types must last fixed too fully specified inwards advance.

While writing this post, I came across this spider web log post from June 2016. It looks similar this guy came upwardly alongside the uncomplicated version of dynamic batching thought every bit he was trying to implement treeRNNs inwards TensorFlow. The implementation at that spot even so doesn't include batching.

By clicking on label "mldl" at the halt of the post, yous tin ambit all of my posts virtually automobile learning / deep learning (ML/DL).