Weekly AI Observations: Linear Attention
On my commute I listened to a podcast episode featuring Songlin Yang—nicknamed the mother of linear attention—where she patiently explained the mechanics behind the architecture. The structure is red-hot right now, especially because several frontline Chinese model labs are doubling down on it. I figured it was worth retelling the story in my own words. How Is a Token Generated? To keep things simple, let’s focus on the inference stage of a large language model—the everyday scenario where a user asks a question....