NVIDIA launched Cosmos 3 on June 1 as an open foundation model for physical AI. Physical AI means robots that understand the world and act on that understanding.

The model is built on a mixture-of-transformers architecture. That phrase matters less than what it does. Cosmos 3 can understand vision, generate videos, predict actions and do world simulation inside one system. You show it a scene, it understands what’s happening. You ask it to predict the next moment, it can. You ask it to generate a sequence of actions to accomplish a goal, it tries.

Trained on billions of samples of text, images, video, sound and action trajectories. NVIDIA sourced that data from their own facilities and partnerships. The dataset is one of the largest physical AI datasets ever assembled.

Cosmos 3 ships in two versions. Nano is 8 billion parameters for reasoning plus 8 billion for generation. Super is 32 billion plus 32 billion. Nano runs on consumer hardware. Super needs more resources. Both are open-weight, meaning you can download them and run them yourself.

This is different from closed API models where you send data to a server. Cosmos 3 on your machine stays on your machine. The weights are accessible. Researchers and roboticists can see what the model is doing and adapt it to their own work.

NVIDIA says Cosmos 3 reduces physical AI training cycles from months to days. That’s not a small claim. If true, the pace of robot development accelerates. The barrier to entry drops. Smaller teams can build systems that would have taken large labs years.

Physical AI matters because robots are coming. They’re not here at scale yet, but the economic argument is obvious. Labor is expensive. Robots don’t sleep. If they work, they scale infinitely. The companies that solve perception and control problems first will have advantages that compound.