Class Corpus

Inheritance Relationships

Base Type

Class Documentation

class Corpus : public marian::data::CorpusBase

Public Functions

Corpus(Ptr<Options> options, bool translate = false, size_t seed = Config::seed)
Corpus(std::vector<std::string> paths, std::vector<Ptr<Vocab>> vocabs, Ptr<Options> options, size_t seed = Config::seed)
SentenceTuple next()

Iterates sentence tuples in the corpus.

A sentence tuple is skipped with no warning if any sentence in the tuple (e.g. a source or target) is longer than the maximum allowed sentence length in words unless the option “max-length-crop” is provided.

Return

A tuple representing parallel sentences.

void shuffle()
void reset()
void restore(Ptr<TrainingState> ts)
iterator begin()
iterator end()
std::vector<Ptr<Vocab>> &getVocabs()
CorpusBase::batch_ptr toBatch(const std::vector<Sample> &batchVector)