Gradient calculation formula of word2vec #34

DamirTenishev · 2024-08-05T16:38:54Z

In line 523 of word2vec there is a formula:

g = (1 - vocab[word].code[d] - f) * alpha;

Can you please help me understand its logic?

Since f is the cross product of embedding and context in the case of hierarchical softmax we want it to be as close as possible to the turn (0 or 1) in a Huffman tree we have to take for this previous word (embedding) and current word's node index (context). In this case we just need

g = (vocab[word].code[d] - f)*alpha

Taking into account that vocab[word].code[d] could be 0 or 1 only, the "1 - vocab[word].code[d]" is just the inversion left-to-right-and-back nodes; what's its purpose?

I summed up some details here: https://datascience.stackexchange.com/questions/129865/intuition-behind-g-variable-calculation-in-the-original-word2vec-implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient calculation formula of word2vec #34

Gradient calculation formula of word2vec #34

DamirTenishev commented Aug 5, 2024

Gradient calculation formula of word2vec #34

Gradient calculation formula of word2vec #34

Comments

DamirTenishev commented Aug 5, 2024