Chinese artificial intelligence startup Z.ai, formerly known as Zhipu AI, has announced the immediate release of GLM-5.2, a 753-billion-parameter large language model designed specifically for long-horizon autonomous coding and engineering work.

The company said the model is available through Hugging Face, the Z.ai API, and more than 20 third-party coding environments. GLM-5.2 supports a context window of up to one million tokens and is offered through enterprise subscription plans starting at $12.60 per month.
A notable aspect of the release is Z.ai’s decision to publish the model’s core weights under the MIT open-source license. The licensing approach allows organizations to download the model, customize or fine-tune it, and deploy it on their own infrastructure if they choose, with operating costs largely tied to computing resources and electricity consumption.
The announcement positions GLM-5.2 as an option for enterprises seeking greater control over how advanced AI systems are deployed and managed. By making the weights openly available, Z.ai is giving businesses the ability to run the model in local environments or virtual machines rather than relying exclusively on externally hosted services.
At the center of the model’s architecture is a new optimization called IndexShare. According to Z.ai, large language models handling extremely long contexts typically face substantial computational demands because attention mechanisms must be repeatedly recalculated across many layers.
IndexShare addresses that challenge by reusing the same indexer across every four sparse attention layers. The company said this approach reduces per-token computational floating-point operations by 2.9 times when operating at the model’s maximum one-million-token context length.
GLM-5.2 also includes an upgraded Multi-Token Prediction layer intended to improve speculative decoding performance. Z.ai reported that the enhancement can increase accepted token length by as much as 20% during inference.
Users are additionally able to choose between different reasoning settings through selectable Thinking Modes. The “Max” mode is designed for situations where the highest level of reasoning capability is required, while the “High” mode is intended to balance performance with lower latency and greater token efficiency.
The release brings together a combination of large-scale model capacity, extended context handling, and open-weight availability. For organizations evaluating advanced coding-focused AI systems, the announcement adds another option in a market increasingly focused on both performance and deployment flexibility.
As enterprises continue assessing how and where to run powerful AI models, GLM-5.2 enters the field with an emphasis on long-context engineering workloads and direct access to its underlying model weights.



