Marian :: Examples

Examples

Basic example for training: The scripts for training a Edinburgh’s WMT16 system adapted from the Romanian-English sample from https://github.com/rsennrich/wmt16-scripts. The resulting system should be competitive or even slightly better than reported in that paper.
Training a transformer model: An example for training a Google-style transformer model introduced in Attention is all you need, Vaswani et al., 2017.
Training on raw texts with built-in SentencePiece: The example shows how to use Taku Kudo’s SentencePiece and Matt Post’s SacreBLEU to greatly simplify the training and evaluation process by providing ways to have reversible hidden preprocessing and repeatable evaluation.
Reconstructing Edinburgh’s WMT17 English-German system: The scripts show how to train a complete WMT-grade system based on Edinburgh’s WMT submission description for en-de.
Reconstructing top WMT17 system with Marian’s Transformer model: The scripts show how to train a complete better than (!) WMT-grade system based on Google’s Transformer model and Edinburgh’s WMT submission description for en-de. This example is a combination of reproducing Edinburgh’s WMT2017 system for en-de with Marian and the example for Transformer training.
Translating with Amun: The scripts demonstrate how to translate with Amun using Edinburgh’s German-English WMT2016 single model and ensemble.

Tutorials

Training with Marian

The training with Marian guide is an overview aimed at new practitioners to machine translation. It covers the model training pipeline in general, and provides a foundation from which to better comprehend the other examples and tutorials presented here.

MT Marathon 2019 Efficiency

The Machine Translation Marathon 2019 Tutorial shows how to do efficient neural machine translation using the Marian toolkit by optimizing the speed, accuracy and use of resources for training and decoding of NMT models.

MT Marathon 2018 Intro

The Machine Translation Marathon 2018 Labs is a Marian tutorial that covers topics like downloading and compiling Marian, translating with a pretrained model, preparing training data and training a basic NMT model, and contains list of exercises introducing different features and model architectures available in Marian.

MT Marathon 2017 Tutorial

Part 1: First steps with Marian: Downloading and compiling Marian. Translation with a pretrained model. Preparing a parallel corpus for training. Training a shallow encoder-decoder model with attention.
Part 2: Complex models: Here we take a look at more complex models, for instance deeper models or multi-encoder models.
Part 3: Coding tutorial: Code a custom model, here a simple Sutskever-style model without attention.

Use cases

Winning system of the WMT 2016 APE shared task: This page provides data and model files for our shared task winning APE system described in Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing.
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing: This page provides data and model files and training instructions for our models described in Junczys-Dowmunt & Grundkiewicz (2017). An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing.