Class SubBatch

Class Documentation

class SubBatch

Batch of sentences represented as word indices with masking.

Public Functions

SubBatch(size_t size, size_t width, const Ptr<const Vocab> &vocab)

Creates an empty subbatch of specified size.

Parameters
  • size: Number of sentences

  • width: Number of words in the longest sentence

Words &data()

Flat vector of word indices.

The order of indices is \(idx_{0,0}, idx_{0,1},\dots,idx_{0,s}, \dots, idx_{w,0},idx_{w,1},\dots,idx_{w,s}\), where \(w\) is the number of words (width) and \(s\) is the number of sentences (size).

const Words &data() const
size_t locate(size_t batchIdx, size_t wordPos) const

compute flat index into data() and mask() vectors for given batch index and word index in sentence

std::vector<float> &mask()

Flat masking vector; 0 is used for masked words.

See

data()

const std::vector<float> &mask() const
const Ptr<const Vocab> &vocab() const

Accessors to the vocab_ field.

size_t batchSize() const

The number of sentences in the batch.

size_t batchWidth() const

The number of words in the longest sentence in the batch.

size_t batchWords() const

The total number of words in the batch (not counting masked-out words).

std::vector<Ptr<SubBatch>> split(size_t n, size_t sizeLimit) const

Splits the stream into sub-batches of equal size (except for last).

Return

Vector of pointers to new sub-batches (or nullptrs where run out of sub-batches)

See

marian::data::Batch::split(size_t n)

Parameters
  • n: number of sub-batches to split into

  • sizeLimit: Pretend the batch only has this many sentences. Used for MB-size ramp-up.

void setWords(size_t words)

Public Static Functions

static size_t locate(size_t batchIdx, size_t wordPos, size_t batchSize)