Categories: Technology / AI

General Intuition Turns Video Game Clips into Spatial-Reasoning AI

General Intuition Turns Video Game Clips into Spatial-Reasoning AI

A new frontier in AI research rises from the world of video games

General Intuition, a startup born out of Medal, a platform for uploading and sharing video game clips, has emerged with a bold mission: teach AI agents to reason about space and time using a vast trove of gaming videos. The company has closed a $133.7 million seed round led by Khosla Ventures and General Catalyst, with participation from Raine. The aim is to train foundation models and autonomous agents that understand how objects and entities move through space and time, a capability known as spatial-temporal reasoning.

Why video game data? A “data moat” for robust agents

Medal streams more than 2 billion videos per year from about 10 million monthly active users across tens of thousands of games. General Intuition believes this data moat—coupled with the edge-case richness of user-generated clips—offers a richer, more varied training signal than traditional sources like Twitch or YouTube. CEO Pim de Witte explains that players inject a natural bias into the data: clips often capture both triumphs and failures, near-misses and subtle mistakes, which are precisely the scenarios where robust agents must perform well under diverse conditions.

That insight has attracted notable attention from industry players; reports suggest OpenAI considered acquiring Medal for around half a billion dollars late last year. While deal chatter isn’t a formal endorsement, it signals the strategic value of Medal’s repository for building next-generation AI systems.

From gaming clips to real-world robotics and search-and-rescue

General Intuition is not chasing consumer-grade products but aiming for capabilities that generalize beyond entertainment. The company’s early technical progress indicates its models can infer how to act in environments it has not seen before, relying only on visual input that mirrors what a human player would perceive through a camera. Agents navigate space by following controller inputs, learning motion, collision, and interaction dynamics without reliance on text or symbolic world models.

The team’s stated objective is twofold. First, to generate simulated training worlds that can be used to further develop agents, and second, to enable autonomous navigation in unfamiliar physical spaces. The emphasis on spatial-temporal reasoning—reasoning about where objects are and how they will move over time—addresses a blind spot in many current AI approaches, which tend to favor language or static perception over dynamic, real-world understanding.

A practical approach that diverges from traditional world models

General Intuition is pursuing world-model-style research, but with a strategic twist. Unlike other groups that commercialize their world models directly as products (for example, models used to train agents or generate content), General Intuition focuses on applications that sidestep copyright concerns and align with real-world deployments. De Witte notes that the goal isn’t to produce models that compete with game developers’ rights, but to empower bots and non-player characters capable of scaling their behavior to any difficulty level and adapt to a broad spectrum of tasks.

Moritz Baier-Lentz, a co-founder and partner at Lightspeed Ventures, adds that the value lies in a flexible, liquidity-generating agent capable of adjusting its performance to sustain engagement and retention in gaming contexts. Beyond games, the same spatial-temporal reasoning capabilities could illuminate how robots and drones interpret and move through real environments, a capability that has clear implications for search-and-rescue missions and disaster response.

The path to AGI and beyond

De Witte frames spatial-temporal reasoning as a missing piece in the broader AI quest for artificial general intelligence. He argues that while language models excel at text-based tasks, much information about the physical world—how things move, collide, and interact in space and time—requires a non-textual, perceptual understanding that humans naturally exploit. By training agents through visual inputs drawn from real gameplay, General Intuition hopes to preserve rich spatial intuition that language alone cannot capture.

The company’s roadmap includes advancing toward agents that can autonomously navigate novel environments and, ultimately, contribute to a broader AGI stack by integrating spatial-temporal competencies with other modalities and planning capabilities.

Commercial outlook and future milestones

In the near term, General Intuition plans to expand its engineering and research teams, deepen its gated training regimes, and push two primary milestones: creating diverse simulated worlds for training a wide range of agents and enabling reliable autonomous navigation in unfamiliar real-world settings. While the product focus centers on gaming applications and resilient robots, the underlying technology targets broader use cases, including drones used in search and rescue, where robust perception and motion understanding can save lives.

As the AI landscape evolves, General Intuition’s approach—leveraging a massive, user-generated video corpus to cultivate spatial-temporal intelligence—will be watched closely by researchers and practitioners seeking practical breakthroughs on the path to more capable, general-purpose AI systems.