Class CorpusBase¶
Defined in File corpus_base.h
Inheritance Relationships¶
Base Types¶
public marian::data::DatasetBase< SentenceTuple, CorpusIterator, CorpusBatch >
(Template Class DatasetBase)public marian::data::RNGEngine
(Class RNGEngine)
Derived Types¶
public marian::data::Corpus
(Class Corpus)public marian::data::CorpusNBest
(Class CorpusNBest)public marian::data::CorpusSQLite
(Class CorpusSQLite)
Class Documentation¶
-
class
CorpusBase
: public marian::data::DatasetBase<SentenceTuple, CorpusIterator, CorpusBatch>, public marian::data::RNGEngine¶ Subclassed by marian::data::Corpus, marian::data::CorpusNBest, marian::data::CorpusSQLite
Public Types
-
typedef SentenceTuple
Sample
¶
Public Functions
-
CorpusBase
(const std::vector<std::string> &paths, const std::vector<Ptr<Vocab>> &vocabs, Ptr<Options> options, size_t seed = Config::seed)¶
-
virtual
~CorpusBase
()¶
Protected Functions
-
void
initEOS
(bool training)¶ Determine if EOS symbol should be added to input.
-
void
addWordsToSentenceTuple
(const std::string &line, size_t batchIndex, SentenceTupleImpl &tup) const¶ Helper function converting a line of text into words using the i-th vocabulary and adding them to the sentence tuple.
-
void
addAlignmentToSentenceTuple
(const std::string &line, SentenceTupleImpl &tup) const¶ Helper function parsing a line with word alignments and adding them to the sentence tuple.
-
void
addWeightsToSentenceTuple
(const std::string &line, SentenceTupleImpl &tup) const¶ Helper function parsing a line of weights and adding them to the sentence tuple.
-
void
addAlignmentsToBatch
(Ptr<CorpusBatch> batch, const std::vector<Sample> &batchVector)¶
-
void
addWeightsToBatch
(Ptr<CorpusBatch> batch, const std::vector<Sample> &batchVector)¶
Protected Attributes
-
std::vector<bool>
addEOS_
¶ Determines if a EOS symbol should be added.
By default this is true for any sequence, but should be false for instance for classifier labels. This is set per input stream, hence a vector.
-
size_t
pos_
= {0}¶
-
size_t
maxLength_
= {0}¶
-
bool
maxLengthCrop_
= {false}¶
-
bool
rightLeft_
= {false}¶
-
bool
tsv_
= {false}¶
-
size_t
tsvNumInputFields_
= {0}¶
-
int
weightFileIdx_
= {-1}¶ Index of the file with weights in paths_ and files_; -1 means no weights file provided.
-
int
alignFileIdx_
= {-1}¶ Index of the file with alignments in paths_ and files_; -1 means no alignment file provided.
-
typedef SentenceTuple