Class GraphGroup¶
Defined in File graph_group.h
Inheritance Relationships¶
Derived Types¶
public marian::AsyncGraphGroup
(Class AsyncGraphGroup)public marian::SingletonGraph
(Class SingletonGraph)public marian::SyncGraphGroup
(Class SyncGraphGroup)
Class Documentation¶
-
class
GraphGroup
¶ Base class for managing the training process across one, multiple gpus, or even multiple machines with multiple gpus.
Subclassed by marian::AsyncGraphGroup, marian::SingletonGraph, marian::SyncGraphGroup
Public Functions
-
void
initGraphsAndOpts
()¶
-
virtual
~GraphGroup
()¶
-
void
increaseCostScaleFactor
()¶
-
void
decreaseCostScaleFactor
()¶
-
void
load
()¶
-
void
save
(bool isFinal = false)¶
-
void
swapWithSmoothed
()¶
-
bool
isMainProcess
() const¶
-
void
barrier
() const¶
-
void
validate
()¶
-
void
finalize
()¶
-
float
checkNanOrNorm
(size_t i, size_t begin, size_t end)¶
-
float
computeNormalizationFactor
(float gNorm, size_t updateTrgWords)¶ This function computes are normalization factor that is applied to the gradient before an update.
Depending on various settings this will return a normalizer that can perform a combination of:
apply a cost scaling factor if cost scaling is enabled
normalize the gradient by the number of words in a batch if requested (turning ce-sum in to ce-mean). @TODO: once fp16 stability issues are proven to not be caused by this, remove.
re-scale the gradient based on a dynamic running average of gradient norms
-
Ptr<data::BatchStats>
collectStats
(Ptr<ExpressionGraph> graph, Ptr<models::ICriterionFunction> model, const std::vector<Ptr<Vocab>> &vocabs, double multiplier = 1.)¶ Determine maximal batch size that can fit into the given workspace so that reallocation does not happen.
Rather adjust the batch size based on the statistics collected here. Activated with
--mini-batch-fit
. In a multi-GPU scenario, the first GPU is used to determine the size. The actual allowed size is then determined by multiplying it with the number of devices, which is passed in as the ‘multiplier’.Rather adjust the batch size based on the stastistics collected here. Activated with
--mini-batch-fit
. In a multi-GPU scenario, the first GPU is used to determine the size. The actual allowed size is then determined by multiplying it with the number of devices, which is passed in as the ‘multiplier’.
-
void
setTypicalTrgBatchWords
(size_t typicalTrgBatchWords)¶
-
double
getTypicalTrgBatchWords
()¶
-
void
updateAverageTrgBatchWords
(size_t trgBatchWords)¶
Protected Functions
-
size_t
numberOfInputFiles
()¶
Protected Attributes
-
Ptr<ICommunicator>
comm_
¶
-
ShardingMode
shardingMode_
= {ShardingMode::global}¶
-
std::vector<Ptr<ExpressionGraph>>
graphs_
¶
-
std::vector<Ptr<OptimizerBase>>
optimizerShards_
¶
-
bool
finalized_
= {false}¶
-
double
typicalTrgBatchWords_
= {0}¶
-
bool
mbRoundUp_
= {true}¶
-
bool
costScaling_
= {false}¶
-
float
costScalingFactor_
= {1.f}¶
-
size_t
costScalingFreq_
= {2000}¶
-
float
costScalingMultiplier_
= {2.f}¶
-
float
costScalingFactorMinimum_
= {1.f}¶
-
size_t
noNanSeen_
= {0}¶
-
size_t
nanSeen_
= {0}¶
-
bool
checkGradientNan_
= {false}¶
-
bool
dynamicGradientScaling_
= {false}¶
-
float
dynamicGradientScalingFactor_
= {2.f}¶
-
bool
dynamicGradientScalingUseLogs_
= {false}¶
-
size_t
dynamicGradientScalingFadeout_
= {0ul}¶
-
void