Grokking Can we observe grokking on modular addition in a toy example? This is inspired by: https://arxiv.org/abs/2301.05217 but running on a MLP instead of a transformer.