Class SubBatch¶
Defined in File corpus_base.h
Class Documentation¶
-
class
SubBatch
¶ Batch of sentences represented as word indices with masking.
Public Functions
-
SubBatch
(size_t size, size_t width, const Ptr<const Vocab> &vocab)¶ Creates an empty subbatch of specified size.
- Parameters
size
: Number of sentenceswidth
: Number of words in the longest sentence
-
Words &
data
()¶ Flat vector of word indices.
The order of indices is \(idx_{0,0}, idx_{0,1},\dots,idx_{0,s}, \dots, idx_{w,0},idx_{w,1},\dots,idx_{w,s}\), where \(w\) is the number of words (width) and \(s\) is the number of sentences (size).
-
size_t
locate
(size_t batchIdx, size_t wordPos) const¶ compute flat index into data() and mask() vectors for given batch index and word index in sentence
-
size_t
batchSize
() const¶ The number of sentences in the batch.
-
size_t
batchWidth
() const¶ The number of words in the longest sentence in the batch.
-
size_t
batchWords
() const¶ The total number of words in the batch (not counting masked-out words).
-
std::vector<Ptr<SubBatch>>
split
(size_t n, size_t sizeLimit) const¶ Splits the stream into sub-batches of equal size (except for last).
- Return
Vector of pointers to new sub-batches (or nullptrs where run out of sub-batches)
- See
marian::data::Batch::split(size_t n)
- Parameters
n
: number of sub-batches to split intosizeLimit
: Pretend the batch only has this many sentences. Used for MB-size ramp-up.
-
void
setWords
(size_t words)¶
Public Static Functions
-
static size_t
locate
(size_t batchIdx, size_t wordPos, size_t batchSize)¶
-