Class CorpusBatch¶
Defined in File corpus_base.h
Inheritance Relationships¶
Base Type¶
public marian::data::Batch
(Class Batch)
Derived Type¶
public marian::data::BertBatch
(Class BertBatch)
Class Documentation¶
-
class
CorpusBatch
: public marian::data::Batch¶ Batch of source(s) and target sentences with additional information, such as guided alignments and sentence or word-level weighting.
Subclassed by marian::data::BertBatch
Public Functions
-
Ptr<SubBatch>
operator[]
(size_t i) const¶ Access i-th subbatch storing a source or target sentence.
The order of subbatches is: 1st source sentence, 2nd source sentence, …, target sentence.
- Return
Pointer to the requested element.
- Parameters
i
: position of the element to return
-
size_t
size
() const¶ The number of sentences in the batch.
-
size_t
words
(int which = 0) const¶ The total number of words in the batch (not counting masked-out words).
Pass which=0 for source words and -1 for target words.
-
size_t
width
() const¶ The width of the source mini-batch.
Num words + padded?
-
size_t
sizeTrg
() const¶ The number of sentences in the batch, target words.
-
size_t
wordsTrg
() const¶ The total number of words in the batch (not counting masked-out words).
-
size_t
widthTrg
() const¶ The target width (=max length) of the mini-batch.
-
size_t
sets
() const¶ The number of source and targets.
-
std::vector<Ptr<Batch>>
split
(size_t n, size_t sizeLimit)¶ Splits the batch into batches of equal size (except for last).
- Return
Vector of pointers to new sub-batches (or nullptrs where run out of sub-batches)
- See
marian::data::SubBatch::split(size_t n)
- Parameters
n
: number of sub-batches to split intosizeLimit
: Clip batch content to the first sizeLimit sentences in the batch
-
const std::vector<WordAlignment> &
getGuidedAlignment
() const¶
-
void
setGuidedAlignment
(std::vector<WordAlignment> &&aln)¶
-
void
debug
(bool printIndices = false)¶ Prints the batch in a readable form on stderr for debugging.
Public Static Functions
-
static Ptr<CorpusBatch>
fakeBatch
(const std::vector<size_t> &lengths, const std::vector<Ptr<Vocab>> &vocabs, size_t batchSize, Ptr<Options> options)¶ Creates a batch filled with fake data.
Used to determine the size of the batch object. With guided-alignments and multiple encoders, those multiple source streams are expected to have the same lengths.
- Return
Fake batch of the same size as the real batch.
- Parameters
lengths
: List of subbatch sizes.batchSize
: Number of sentences in the batch.options
: Options with “guided-alignment” and “data-weighting”.
-
Ptr<SubBatch>