Class FactoredVocab

Nested Relationships

Nested Types

Inheritance Relationships

Base Type

Class Documentation

class FactoredVocab : public marian::IVocab

Public Functions

size_t load(const std::string &factoredVocabPath, size_t maxSizeUnused = 0)
virtual void create(const std::string &vocabPath, const std::vector<std::string> &trainPaths, size_t maxSize)
virtual const std::string &canonicalExtension() const
const std::vector<std::string> &suffixes() const
Word operator[](const std::string &word) const
Words encode(const std::string &line, bool addEOS = true, bool inference = false) const
std::string decode(const Words &sentence, bool ignoreEos = true) const
std::string surfaceForm(const Words &sentence) const
const std::string &operator[](Word id) const
virtual size_t size() const
virtual std::string type() const
virtual Word getEosId() const
virtual Word getUnkId() const
std::string toUpper(const std::string &line) const
std::string toEnglishTitleCase(const std::string &line) const
void transcodeToShortlistInPlace(WordIndex *ptr, size_t num) const
WordIndex getUnkIndex() const
virtual void createFake()
Word randWord() const
size_t factorVocabSize() const
size_t virtualVocabSize() const
size_t lemmaSize() const
FactoredVocab::CSRData csr_rows(const Words &words) const
void lemmaAndFactorsIndexes(const Words &words, std::vector<IndexType> &lemmaIndices, std::vector<float> &factorIndices) const

Decodes the indexes of lemma and factor for each word and outputs that information separately.

It will return two data structures that contain separate information regarding lemmas and factors indexes by receiving a list with the word indexes of a batch.

  • [in] words: vector of words

  • [out] lemmaIndices: lemma index for each word

  • [out] factorIndices: factor usage information for each word (1 if the factor is used 0 if not)

size_t getNumGroups() const
std::pair<size_t, size_t> getGroupRange(size_t g) const
size_t getTotalFactorCount() const

Auxiliary function that return the total number of factors (no lemmas) in a factored vocabulary.


number of factors

Word factors2word(const std::vector<size_t> &factors) const
void word2factors(Word word, std::vector<size_t> &factors) const
Word lemma2Word(size_t factor0Index) const
Word expandFactoredWord(Word word, size_t groupIndex, size_t factorIndex) const
bool canExpandFactoredWord(Word word, size_t groupIndex) const
size_t getFactor(Word word, size_t groupIndex) const
bool lemmaHasFactorGroup(size_t factor0Index, size_t g) const
const std::string &getFactorGroupPrefix(size_t groupIndex) const
const std::string &getFactorName(size_t groupIndex, size_t factorIndex) const
std::string decodeForDiagnostics(const Words &sentence) const
std::string word2string(Word word) const
Word string2word(const std::string &w) const
bool tryGetFactor(const std::string &factorGroupName, size_t &groupIndex, size_t &factorIndex) const

Public Static Functions

static bool isFactorValid(size_t factorIndex)
Ptr<FactoredVocab> tryCreateAndLoad(const std::string &path)

Public Static Attributes

constexpr size_t FACTOR_NOT_APPLICABLE = (SIZE_MAX - 1)
constexpr size_t FACTOR_NOT_SPECIFIED = (SIZE_MAX - 2)
struct CSRData

Public Members

Shape shape
std::vector<float> weights
std::vector<IndexType> indices
std::vector<IndexType> offsets