# Expression graphs The design of the deep learning framework in Marian is based on reverse-mode [auto-differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) (also known as backpropagation) with dynamic computation graphs. Computation graphs allow a great deal of freedom in network architectures, and they can deal with complicated structures like conditions and loops. The dynamic declaration, which means a new graph is created for each training instance (for a training example or a batch), is also advantageous. It allows handling of variably sized inputs, as well as the cases where the graph may change depending on the results of previous steps. Compared to static declaration, a dynamic computation graph could be expensive in terms of creating and optimising computation graphs. Marian uses careful memory management to remove overhead in computation graph construction, and supports efficient execution on both CPU and GPU. The main implementation of computation graph is in under [src/graph](dir_src_graph) directory. Building blocks for graphs: - [graph construction](#graph-construction) - [node types](#node-types) - [graph execution](#graph-execution) ## Graph construction What is a computation graph? All the numerical computations are expressed as a computation graph. A computation graph (or graph in short) is a series of operations arranged into a graph of nodes. To put it simply, a graph is just an arrangement of nodes that represent what you want to do with the data. **Example 1** Suppose you want to calculate the expression: `z=x*y+sin(x)`. The computation graph of this expression is something like Figure 1. ![fig1](images/graph_example1.jpg "Figure 1 An example of computation graph") *Figure 1 An example of computation graph* In Marian, the `ExpressionGraph` class is the main implementation of a computation graph. An `ExpressionGraph` object keeps a record of data (tensors) and all operations in a directed graph consisting of `Node` objects. A `Node` is the basic unit of a graph. It can be an operation (e.g., dot()), or a tensor. Each operation in a graph is a `NaryNodeOp` (a child of `Node` class). Each operation defines its forward and backward steps. Except for operations, a Node can also be a constant tensor (`ConstantNode`) or a parameter tensor (`ParamNode`). To create a graph, we use `New<>` shortcut in place of regular constructors: ```cpp // create a graph auto graph = New(); ``` After creating a graph, we also need to initialise the graph object with device options by `setDevice()` and workspace memory by `reserveWorkspaceMB()`, otherwise the program will result in a crash. ```cpp // initialise graph with device options // here we specify device no. is 0 // device type can be DeviceType::cpu or DeviceType::gpu graph->setDevice({0, DeviceType::cpu}); // preallocate workspace memory (MB) for the graph graph->reserveWorkspaceMB(128); ``` The _workspace memory_ means the size of the memory available for the forward and backward step of the training procedure. This does not include model size and optimizer parameters that are allocated outsize workspace. Hence you cannot allocate all device memory to the workspace. To create a graph, Marian offers a set of shortcut functions that implements the common expression operators for a neural network (see [src/graph/expression_operators.h](file_src_graph_expression_operators.h), such as `affine()`. These functions actually construct the corresponding operation nodes in the graph, make links with other nodes. E.g., `affine()` construct a `AffineNodeOp` node in the graph. Thus, building a graph turns into a simple task of defining expressions by using those functions. **Building graph of Example 1 using Marian** The following code is used to build the graph in Example 1 with inputs `x=2` and `y=3`. ```cpp // create and initialise a graph object auto graph = New(); graph->setDevice({0, DeviceType::cpu}); graph->reserveWorkspaceMB(8); // add input node x auto x = graph->constant({1,1}, inits::fromValue(2)); // add input node y auto y = graph->constant({1,1}, inits::fromValue(3)); // define expression auto mulOp = x*y; auto sinOp = sin(x); auto z = mulOp + sinOp; // You can also define this expression: auto z = x*y + sin(x); ``` For the above example, `constant()` is used to construct a constant node (a tensor) in the graph as the input. We will give more details about this function in the next section [**Node types**](#node-types). The operators `*`, `+` and function `sin()` add corresponding operation nodes (i.e., `MultNodeOp` and `SinNodeOp`) in the graph. To check the graph, Marian offers `graphviz()` function to generate graph layout in Graphviz format for visualisation. This visualisation might not be practical for real-size graphs due to an enormous number of nodes and layers. You can print the graph layout on console by running the following code: ```cpp // print the graph layout on console std::cout<graphviz()<constant({N, NUM_FEATURES}, inits::fromVector(inputData)); ``` For the above example, the shape of the constant node is `{N, NUM_FEATURES}`, and the value of the constant node is initialised from a vector `inputData`. `inits::fromVector()` returns a `NodeInitializer` which is a functor used to initialise a tensor by copying from the given vector. More functions used to initialise a node can be found in [src/graph/node_initializers.h](namespace_marian__inits) file. Marian also provides some shortcut functions to construct special constant nodes, such as `ones()` and `zeros()`: ```cpp // construct a constant node with 1 auto ones = graph()->ones({10,10}); // construct a constant node with 0 auto zeros = graph()->zeros({10,10}); ``` ### ParamNode `ParamNode` is used to store model parameters whose value can be changed during the training, such as weights and biases. In addition to the shape and the element type, we need to specify whether a `ParamNode` object is _trainable_ or not. If a parameter node is _trainable_, then its value will be tracked and updated during the training procedure. For a `ParamNode`, the default value of `trainable_` is `true`. We can define whether this parameter node is trainable by `Node::setTrainable()` function. To construct a parameter node in the graph, we use the `param()` function in the `ExpressionGraph` class. For a parameter node, we need to specify its name. ```cpp // construct a parameter node called W1 in the graph auto W1 = graph->param("W1", {NUM_FEATURES, 5}, inits::uniform(-0.1f, 0.1f)); ``` The parameter node `W1` has a shape of `{NUM_FEATURES, 5}`, and is initialised with random numbers from the uniform distribution `Uniform(-0.1, 0.1)`. ### NaryNodeOp `NaryNodeOp` is the base class that defines the operations in a graph. It mainly contains unary and binary operators. Each `NaryNodeOp` defines its forward operations in `Node::forwardOps()` and backward operations in `Node::backwardOps()`. In the current version of Marian, we provide a set of common operations (inherited from `NaryNodeOp`) used to build a neural network, such as `AffineNodeOp` (affine transformation), `CrossEntropyNodeOp` (cross-entropy loss function) and `TanhNodeOp` (tanh activation function). As mentioned earlier, Marian implements a set of APIs that can easily add operations to the graph. E.g., we can use `affine()` to perform affine transformation and then `tanh()` to perform tanh activation function on the results: ```cpp // perform affine transformation: x*W1+b // and then perform tanh activation function auto h = tanh(affine(x, W1, b1)); ``` In the above example, `affine()` and `tanh()` actually add `AffineNodeOp` and `TanhNodeOp` nodes to the graph. For more shortcut functions used to add operations in the graph, you can find in [src/graph/expression_operators.h](file_src_graph_expression_operators.h) file. ## Graph execution Once you finish building a graph by adding all the nodes, now you can perform the real computation. ### Forward pass The forward pass refers to the calculation process. It traverses through all nodes from the input layer (leaves) to the output layer (root). To perform the forward pass, you can call the function `forward()`. The `forward()` function mainly does two things: - allocates memory for each node (`Node::allocate()`) - computing the new tensor for each node by performing required operations (`Node::forward()`), and the resulting new tensor is stored in `val_` attribute in each Node. **Forward pass of Example 1** To run the forward pass of Example 1, you can run the following code: ```cpp // Perform the forward pass on the nodes of the graph graph->forward(); // get the computation result of z std::vector w; z->val()->get(w); std::cout<<"z="<backprop(); // This function is equal to the following code: /* graph->forward(); graph->backward(); */ ``` **Backward pass of modified Example 1** As shown in Figure 2, there is no trainable node in the graph of Example 1; this means we cannot perform backwards pass on this graph. To demonstrate the backward pass, we modify Example 1 by changing the constant node `x` to a parameter node (change `constant()` to `param()`). Here is the modification: ```cpp // add parameter node x auto x = graph->param("x", {1,1}, inits::fromValue(2)); ``` The resulting graph is also different as displayed in Figure 3. ![fig3](images/example1_dot2.png "Figure 3 Graph layout of modified Example 1") *Figure 3 Graph layout of modified Example 1* To perform the backward pass of modified Example 1, you can run the following code: ```cpp // Perform the backward pass on the trainable nodes of the graph graph->backward(); // get the gradient of x node std::vector b; x->grad()->get(b); std::cout<<"dz/dx="<constant({1,1}, inits::fromValue(6)); // define loss function auto loss = abs(actual-z); ``` The graph is changed to Figure 4. ![fig4](images/example1_dot3.png "Figure 4 Graph layout of modified Example 1 with loss function") *Figure 4 Graph layout of modified Example 1 with loss function* The purpose of the optimiser is to adjust the variables to fit the data. In Marian, there are three built-in optimiser classes: `Sgd`, `Adagrad` and `Adam`. `Sgd` is an optimiser based on [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). For each iteration, it updates the parameter `w` according to the rule of `w = w - learning_rate * gradient`. `Adagrad` implements [Adagrad algorithm](https://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf), an optimiser with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. `Adam` is an implementation of the [Adam algorithm](https://arxiv.org/abs/1412.6980), a stochastic gradient descent method that is based on an adaptive estimation of first-order and second-order moments. . We use `Optimizer<>` to set up an optimiser with the learning rate: ```cpp // Choose optimizer (Sgd, Adagrad, Adam) and initial learning rate auto opt = Optimizer(0.01); ``` After an iteration of backpropagation, we can call `update()` function to update the parameters: ```cpp // update parameters in the graph opt->update(graph); ``` **Set up an optimiser for modified Example 1** Continue with Example 1, we choose `Sgd` as the optimiser and update the parameter `x`: ```cpp // set up Sgd optimiser with 0.005 learning rate auto opt = Optimizer(0.005); // update parameters opt->update(graph); // get the new value of x std::vector v; x->val()->get(v); std::cout<<"x="<params(); ``` Besides, we provide APIs to support the mechanism of Gradient Checkpointing. This method works by trading compute for memory, which reruns a forward-pass segment for each checkpoint segment during the backward pass. Currently, Marian only supports setting checkpoint nodes manually by calling `Node::markCheckpoint()` or `checkpoint()`. To enable the gradient-checkpointing mode for a graph, we use `setCheckpointing()`: ```cpp // enable gradient-checkpointing for a graph graph->setCheckpointing(true); ``` We can also save and load the parameters of a graph in Marian. We can call `save()` to save all parameters in the graph into a file (`.npz` or `.bin` format). The function `load()` can load all model parameters to the graph (either from an array of `io::Items`, a file or a buffer). ```cpp // specify the filename std::string filename = "my_model.npz"; // save all the parameters into a file graph->save(filename); // load model from a file graph->load(filename); ```