What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3
See figure 6: Classes vs labels in columns B and C. Subcircuit B delays phase change on number classes vs C delays on number of labels (dramatically)