Layer Normalization

TLDR; The authors propose a new normalization scheme called "Layer Normalization" that works especially well for recurrent networks. Layer Normalization is similar to Batch Normalization, but only depends on a single training case. As such, it's well suited for variable length sequences or small batches. In Layer Normalization each hidden unit shares the same normalization term. The authors show through experiments that Layer Normalization converges faster, and sometimes to better solutions, than batch- or unnormalized RNNs. Batch normalization still performs better for CNNs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer-norm.md

layer-norm.md

Layer Normalization

Files

layer-norm.md

Latest commit

History

layer-norm.md

File metadata and controls

Layer Normalization