Layers¶
In a typical deep neural network, highest-level blocks, which perform different kinds of transformations on their inputs are called layers. A layer wraps a group of nodes and performs a specific mathematical computation, offering a shortcut for building a more complex neural network.
In Marian, for example, the mlp::dense
layer represents a fully connected layer, which implements
the operation output = activation(input * weight + bias)
. A dense layer in the graph can be
constructed with the following code:
// add input node x
auto x = graph->constant({120,5}, inits::fromVector(inputData));
// construct a dense layer in the graph
auto layer1 = mlp::dense()
("prefix", "layer1") // prefix name is layer1
("dim", 5) // output dimension is 5
("activation", (int)mlp::act::tanh) // activation function is tanh
.construct(graph)->apply(x); // construct this layer in graph
// and link node x as the input
The options are passed to the layer using pairs of (key, value)
, where key
is a predefined
option, and value
is the option value. Then construct()
is called to create a layer instance in
the graph, and apply()
to link the input with this layer.
Alternatively, the same layer can be created defining nodes and operations directly:
// construct a dense layer using nodes
auto W1 = graph->param("W1", {120, 5}, inits::glorotUniform());
auto b1 = graph->param("b1", {1, 5}, inits::zeros());
auto h = tanh(affine(x, W1, b1));
There are four categories of layers implemented in Marian, described in the sections below.
Convolution layer¶
To use a convolution
layer, you first need to install NVIDIA cuDNN.
The convolution layer supported by Marian is a 2D
convolution layer.
This layer creates a convolution kernel which is used to convolved with the input. The options that
can be passed to a convolution
layer are the following:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
prefix |
Prefix name (used to form the parameter names) |
|
|
kernel-dims |
The height and width of the kernel |
|
|
kernel-num |
The number of kernel |
|
|
paddings |
The height and width of paddings |
|
|
strides |
The height and width of strides |
|
|
Example:
// construct a convolution layer
auto conv_1 = convolution(graph) // pass graph pointer to the layer
("prefix", "conv_1") // prefix name is conv_1
("kernel-dims", std::make_pair(3,3)) // kernel is 3*3
("kernel-num", 32) // kernel no. is 32
.apply(x); // link node x as the input
MLP layers¶
Marian offers mlp::mlp
, which creates a
multilayer perceptron (MLP) network.
It is a container which can stack multiple layers using push_back()
function. There are two types
of MLP layers provided by Marian: mlp::dense
and mlp::output
.
The mlp::dense
layer, as introduced before, is a fully connected layer, and it accepts the
following options:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
prefix |
Prefix name (used to form the parameter names) |
|
|
dim |
Output dimension |
|
|
layer-normalization |
Whether to normalise the layer output or not |
|
|
nematus-normalization |
Whether to use Nematus layer normalisation or not |
|
|
activation |
Activation function |
|
|
The available activation functions for mlp are mlp::act::linear
, mlp::act::tanh
,
mlp::act::sigmoid
, mlp::act::ReLU
, mlp::act::LeakyReLU
, mlp::act::PReLU
, and
mlp::act::swish
.
Example:
// construct a mlp::dense layer
auto dense_layer = mlp::dense()
("prefix", "dense_layer") // prefix name is dense_layer
("dim", 3) // output dimension is 3
("activation", (int)mlp::act::sigmoid) // activation function is sigmoid
.construct(graph)->apply(x); // construct this layer in graph and link node x as the input
The mlp::output
layer is used, as the name suggests, to construct an output layer. You can tie
embedding layers to mlp::output
layer using tieTransposed()
, or set shortlisted words using
setShortlist()
. The general options of mlp::output
layer are listed below:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
prefix |
Prefix name (used to form the parameter names) |
|
|
dim |
Output dimension |
|
|
vocab |
File path to the factored vocabulary |
|
|
output-omit-bias |
Whether this layer has a bias parameter |
|
|
lemma-dim-emb |
Re-embedding dimension of lemma in factors, must be used with |
|
|
output-approx-knn |
Parameters for LSH-based output approximation, i.e., |
|
None |
Example:
// construct a mlp::output layer
auto last = mlp::output()
("prefix", "last") // prefix name is dense_layer
("dim", 5); // output dimension is 5
Finally, an example showing how to create a mlp::mlp
network containing multiple layers:
// construct a mlp::mlp network
auto mlp_networks = mlp::mlp() // construct a mpl container
.push_back(mlp::dense() // construct a dense layer
("prefix", "dense") // prefix name is dense
("dim", 5) // dimension is 5
("activation", (int)mlp::act::tanh))// activation function is tanh
.push_back(mlp::output() // construct a output layer
("dim", 5)) // dimension is 5
("prefix", "mlp_network") // prefix name is mlp_network
.construct(graph); // construct this mlp layers in graph
RNN layers¶
Marian offers rnn::rnn
for creating a recurrent neural network
(RNN) network. Just like mlp::mlp
,
rnn::rnn
is a container which can stack multiple layers using push_back()
function. Unlike mlp
layers, Marian only provides cell-level APIs to construct RNN. RNN cells only process a single
timestep instead of the whole batches of input sequences. There are two types of rnn layers provided
by Marian: rnn::cell
and rnn::stacked_cell
.
The rnn::cell
is the base component of RNN and rnn::stacked_cell
is a stack of rnn::cell
. The
few options of rnn::cell
layer are listed below:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
type |
Type of RNN cell |
|
|
There are nine types of RNN cells provided by Marian: gru
, gru-nematus
, lstm
, mlstm
, mgru
,
tanh
, relu
, sru
, ssru
. The general options for all RNN cells are the following:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
dimInput |
Input dimension |
|
|
dimState |
Dimension of hidden state |
|
|
prefix |
Prefix name (used to form the parameter names) |
|
|
layer-normalization |
Whether to normalise the layer output or not |
|
|
dropout |
Dropout probability |
|
|
transition |
Whether it is a transition layer |
|
|
final |
Whether it is an RNN final layer or hidden layer |
|
|
Note
Not all the options listed above are available for all the cells. For example, final
option is
only used for gru
and gru-nematus
cells.
Example for rnn::cell
:
// construct a rnn cell
auto rnn_cell = rnn::cell()
("type", "gru") // type of rnn cell is gru
("prefix", "gru_cell") // prefix name is gru_cell
("final", false); // this cell is the final layer
Example for rnn::stacked_cell
:
// construct a stack of rnn cells
auto highCell = rnn::stacked_cell();
// for loop to add rnn cells into the stack
for(size_t j = 1; j <= 512; j++) {
auto paramPrefix ="cell" + std::to_string(j);
highCell.push_back(rnn::cell()("prefix", paramPrefix));
}
The list of available options for rnn::rnn
layers:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
type |
Type of RNN layer |
|
|
direction |
RNN direction |
|
|
dimInput |
Input dimension |
|
|
dimState |
Dimension of hidden state |
|
|
prefix |
Prefix name (used to form the parameter names) |
|
|
layer-normalization |
Whether to normalise the layer output or not |
|
|
nematus-normalization |
Whether to use Nematus layer normalisation or not |
|
|
dropout |
Dropout probability |
|
|
skip |
Whether to use skip connections |
|
|
skipFirst |
Whether to use skip connections for the layer(s) with |
|
|
Examples for rnn::rnn()
:
// construct a `rnn::rnn()` container
auto rnn_container = rnn::rnn(
"type", "gru", // type of rnn cell is gru
"prefix", "rnn_layers", // prefix name is rnn_layers
"dimInput", 10, // input dimension is 10
"dimState", 5, // dimension of hidden state is 5
"dropout", 0, // dropout probability is 0
"layer-normalization", false) // do not normalise the layer output
.push_back(rnn::cell()) // add a rnn::cell in this rnn container
.construct(graph); // construct this rnn container in graph
Marian provides four RNN directions in rnn::dir
enumerator: rnn::dir::forward
,
rnn::dir::backward
, rnn::dir::alternating_forward
and rnn::dir::alternating_backward
.
For rnn::rnn(), you can use transduce()
to map the input state to the output state.
An example for transduce()
:
auto output = rnn.construct(graph)->transduce(input);
Embedding layer¶
Marian provides a shortcut to construct a regular embedding layer embedding
for words embedding.
For embedding
layers, there are following options available:
Option Name |
Definition |
Value Type |
Default Value |
---|---|---|---|
dimVocab |
Size of vocabulary |
|
|
dimEmb |
Size of embedding vector |
|
|
dropout |
Dropout probability |
|
|
inference |
Whether it is used for inference |
|
|
prefix |
Prefix name (used to form the parameter names) |
|
|
fixed |
whether this layer is fixed (not trainable) |
|
|
dimFactorEmb |
Size of factored embedding vector |
|
|
factorsCombine |
Which strategy is chosen to combine the factor embeddings; it can be |
|
|
vocab |
File path to the factored vocabulary |
|
|
embFile |
Paths to the factored embedding vectors |
|
|
normalization |
Whether to normalise the layer output or not |
|
|
Example to construct an embedding layer:
// construct an embedding layer
auto embedding_layer = embedding()
("prefix", "embedding") // prefix name is embedding
("dimVocab", 1024) // vocabulary size is 1024
("dimEmb", 512) // size of embedding vector is 512
.construct(graph); // construct this embedding layer in graph