MiniMax launched M3 on June 1 as the first open-weight model combining frontier coding, 1 million token context and native multimodal processing. Open-weight means you can download it, inspect it, adapt it.

The model uses MiniMax’s Sparse Attention architecture. That’s the meaningful innovation. Sparse Attention cuts per-token compute at 1M context to one-twentieth of previous generation. It delivers 9.7 times faster prefill and 15.6 times faster decode.

Prefill and decode are how language models work. You have a prompt. The model processes it (prefill). Then it generates text one token at a time (decode). The faster both go, the faster you get your answer. Speed matters for interactive AI. It affects whether the system feels responsive or sluggish.

M3 has 229.9 billion total parameters. That sounds enormous. But it’s a Mixture-of-Experts model where only 9.8 billion parameters activate per token. The rest stay silent. That sparsity keeps inference cost reasonable relative to capacity.

One million token context means M3 can read a entire book, remember every detail and reference any part of it mid-conversation. Previous models maxed out at hundreds of thousands. MiniMax is claiming order-of-magnitude better context handling.

The model supports frontier coding. It understands images and video natively without conversion. It’s trained on data from across languages and use cases.

MiniMax is a Chinese AI company positioning M3 as the first domestic model offering frontier capabilities. The open-weight release makes it accessible to researchers globally. Download it, run it locally, no API dependency.