Google’s Gemini 3.5 Pro is approaching its general availability launch after being previewed at Google I/O in May, with the company targeting a public release date later this month. The model, currently in limited preview on Google’s Vertex AI platform, is designed to replace Gemini Ultra as the company’s most capable publicly available model for complex reasoning and long-context tasks.

The defining feature of Gemini 3.5 Pro is its context window of two million tokens, the largest offered by any commercially available AI model. In practical terms, this allows users to submit the equivalent of several large books, multiple months of code commits, or thousands of research papers in a single request.

A feature called Deep Think, available in a reasoning-focused configuration of the model, applies step-by-step chain-of-thought reasoning to complex tasks. Google demonstrated Deep Think solving multi-step mathematics problems and reviewing lengthy contract documents during the I/O keynote. The feature targets professional users in legal, financial, and scientific fields.

Gemini 3.5 Pro also introduces expanded multimodal capabilities, processing text, images, audio, and video in a single unified request. This makes it suited for analysing video footage alongside transcript data, or reviewing engineering diagrams in the context of technical specifications.

The launch comes as Apple announced at WWDC 2026 that iOS 27 will use Gemini models to power its overhauled Siri assistant, representing one of the largest commercial deployments of Google’s AI technology. That partnership will put Gemini-backed features directly in the hands of hundreds of millions of iPhone users.

Google said Gemini 3.5 Pro would be priced at $3.50 per million input tokens and $10.50 per million output tokens on Vertex AI. The Google AI developer platform hosts the preview documentation. Related AI model coverage includes the Microsoft Build 2026 announcements and Anthropic’s Claude research access policy shift. Nvidia’s RTX Spark chip is the hardware side of the same AI model deployment wave.