Nvidia’s new genAI model helps robots think like humans
Nvidia has developed a generative AI (genAI) model to help robots make human-like decisions by analyzing surrounding scenes.
The Cosmos Reason model in robots can take in information from video and graphics input, analyze the data, and use its understanding to make decisions.
[ Related: More Nvidia news and insights ]
Cosmos Reason, announced on Monday, helps robots “think like humans do” and make decisions with “just common sense,” said Rev Lebaredian, vice president of Omniverse and simulation technologies.
The model is lightweight at 7 billion parameters and can be used in a variety of physical devices such as installed cameras, traffic signals, and instruments in factories.
“Every smart IoT device that can see, from cameras to traffic lights, every home or industrial robot, will have reasoning,” Lebaredian said.
Companies can develop video AI agents, which will act on massive amounts of data gathered and analyzed from recorded video data and livestreams. “These video agents will soon be everywhere, automating traffic monitoring, improving safety, and enhancing video inspection in everything from industrial facilities to entire cities,” Lebaredian said.
Cosmos Reason is what Nvidia calls a “vision language model” (VLM). That’s different from typical text-based models, which can generate images, videos, or text.
Nvidia’s Cosmos Reason VLM is designed to help robots make better decisions.
Nvidia
OpenAI and other companies have released VLMs, but Cosmos Reason can do deeper reasoning on a long tail of unseen scenarios, he said. The models can establish prior understanding of scenarios and take into account physical interactions and then infer complex interactions or motivations of objects and actors in the scene. It can also understand new and unseen experiences.
For example, robots will be able to connect the dots of making toast, understanding that toast requires butter and a toaster — and a plate on which to serve the food.
Today’s AI robot models have two types of technology underpinning their activity. The VLM interprets instructions and plans actions, while “vision language action” allows for fast actions and muscle memory.
Cosmos Reason is open-source and now available for download, the company said, but it will only work on Nvidia’s hardware.
The company sells the Jetson Thor DGX computer for robots and said its new RTX Pro 6000 GPUs will be in high-end servers. The company also announced new RTX Pro 4000 and 2000 GPUs for high-end desktops. The new GPUs are based on the Blackwell architecture.
Nvidia is grouping its world-building and simulation products under the Omniverse product line. Cosmos Reason is one of many models developed by the company to improve productivity in factories, warehouses, robots, vehicles, and other physical locations.
Omniverse products involve creating a digital copy representation of physical products in the real world. Information in the virtual world is used to create synthetic data to train vision language models.To make AI, Apple is cooking with App Intents – ComputerworldRead More