Meta wants to give common sense to machines

At Meta, artificial intelligence is no longer content with writing poems or sorting images. With V-JEPA 2, the group wants to go further: to help machines understand the world as we do every day, by observing. This new version of the V-JEPA model is capable of predicting what will happen in a scene, anticipating movements, or even planning actions in an unknown environment—like a robot that can guess that an egg cooked in a pan is supposed to end up on a plate.

An AI that learns like a child (or almost)

Meta's ambition is to develop what the company calls "world models," that is, AI capable of mentally simulating the consequences of an action before performing it. "We believe these models will usher in a new era for robotic agents, capable of interacting in the real world without requiring massive amounts of training data," explains Yann LeCun, Meta's AI science director. To acquire this form of common sense, V-JEPA 2 was trained on a very large scale: more than a million hours of video, without human commentary or annotations, were used to train its first level of understanding. The model is based on an architecture called JEPA, which separates the encoding of a situation (via video) from the prediction of what will happen next. This system learns to anticipate an action before it takes place—for example, in the Epic-Kitchens dataset, it is able to guess what a person will do a second later in their kitchen. Even better: once aligned with a language model, V-JEPA 2 excels at tasks like answering questions from a video.

But it is especially in robotics that the model shows concrete results. After a second training phase with only 62 hours of data from robots in action, V-JEPA 2 is able to plan simple gestures: grabbing an object, moving it, placing it in another location — even if the object or location was never seen during training.

One of the most interesting aspects is that the robot does not need to be trained in its final environment. Thanks to a standardized dataset, Meta can directly transfer its model to its own robots in the laboratory, without specific adaptation. It simply needs to observe the current scene and know the visual objective to be achieved (for example, an image of an object placed in a certain location) to imagine scenarios and choose the most promising action.

Meta claims success rates of between 65 and 80% on these "pick-and-place" type tasks, even in unknown environments. V-JEPA 2 is also said to be 30 times faster than Nvidia's Cosmos model, according to Meta's criteria.

Ticker

Meta wants to give common sense to machines

An AI that learns like a child (or almost)

Post a Comment

0 Comments

Most Popular

Narwal Freo Z Ultra Review: The Robot Vacuum Cleaner That Makes the Right Choices

Best VPN in 2025: How to choose it well?

Dragon Ball: 8 Facts You Didn't Know About Zarbon

Marvel: Invincible in comics, this hero joins the Avengers

Will Firefox Survive Without Google? Mozilla's Future More Uncertain Than Ever

Tags

Followers

Footer Menu Widget

Contact form

Ticker

Meta wants to give common sense to machines

An AI that learns like a child (or almost)

You may like these posts

Post a Comment

0 Comments

Most Popular

Narwal Freo Z Ultra Review: The Robot Vacuum Cleaner That Makes the Right Choices

Best VPN in 2025: How to choose it well?

Dragon Ball: 8 Facts You Didn't Know About Zarbon

Marvel: Invincible in comics, this hero joins the Avengers

Will Firefox Survive Without Google? Mozilla's Future More Uncertain Than Ever

Tags

Followers

Footer Menu Widget

Contact form