Class CorpusBase¶
Defined in File corpus_base.h
Inheritance Relationships¶
Base Types¶
public marian::data::DatasetBase< SentenceTuple, CorpusIterator, CorpusBatch >(Template Class DatasetBase)public marian::data::RNGEngine(Class RNGEngine)
Derived Types¶
public marian::data::Corpus(Class Corpus)public marian::data::CorpusNBest(Class CorpusNBest)public marian::data::CorpusSQLite(Class CorpusSQLite)
Class Documentation¶
-
class
CorpusBase: public marian::data::DatasetBase<SentenceTuple, CorpusIterator, CorpusBatch>, public marian::data::RNGEngine¶ Subclassed by marian::data::Corpus, marian::data::CorpusNBest, marian::data::CorpusSQLite
Public Types
-
typedef SentenceTuple
Sample¶
Public Functions
-
CorpusBase(const std::vector<std::string> &paths, const std::vector<Ptr<Vocab>> &vocabs, Ptr<Options> options, size_t seed = Config::seed)¶
-
virtual
~CorpusBase()¶
Protected Functions
-
void
initEOS(bool training)¶ Determine if EOS symbol should be added to input.
-
void
addWordsToSentenceTuple(const std::string &line, size_t batchIndex, SentenceTupleImpl &tup) const¶ Helper function converting a line of text into words using the i-th vocabulary and adding them to the sentence tuple.
-
void
addAlignmentToSentenceTuple(const std::string &line, SentenceTupleImpl &tup) const¶ Helper function parsing a line with word alignments and adding them to the sentence tuple.
-
void
addWeightsToSentenceTuple(const std::string &line, SentenceTupleImpl &tup) const¶ Helper function parsing a line of weights and adding them to the sentence tuple.
-
void
addAlignmentsToBatch(Ptr<CorpusBatch> batch, const std::vector<Sample> &batchVector)¶
-
void
addWeightsToBatch(Ptr<CorpusBatch> batch, const std::vector<Sample> &batchVector)¶
Protected Attributes
-
std::vector<bool>
addEOS_¶ Determines if a EOS symbol should be added.
By default this is true for any sequence, but should be false for instance for classifier labels. This is set per input stream, hence a vector.
-
size_t
pos_= {0}¶
-
size_t
maxLength_= {0}¶
-
bool
maxLengthCrop_= {false}¶
-
bool
rightLeft_= {false}¶
-
bool
tsv_= {false}¶
-
size_t
tsvNumInputFields_= {0}¶
-
int
weightFileIdx_= {-1}¶ Index of the file with weights in paths_ and files_; -1 means no weights file provided.
-
int
alignFileIdx_= {-1}¶ Index of the file with alignments in paths_ and files_; -1 means no alignment file provided.
-
typedef SentenceTuple