General Intuition seeds $134M to teach spatial reasoning with game clips

A new frontier in AI: spatial-temporal reasoning from game videos

General Intuition, a new AI lab born from Medal’s clip-sharing platform, has raised $133.7 million in seed funding to advance agents that understand how objects move through space and time. The core idea: leverage Medal’s enormous trove of gaming videos—2 billion videos per year from 10 million monthly users across thousands of games—to train foundation models and autonomous agents capable of spatial-temporal reasoning. The backers include Khosla Ventures, General Catalyst, and Raine, signaling strong investor confidence in the potential of video-game data to fuel more capable AI systems.

Medal itself is a hub where players upload and share clips. General Intuition’s leadership argues that this data moat—particularly the way gamers capture edge cases and varied scenarios—offers richer training material than traditional sources like Twitch or YouTube. CEO Pim de Witte explains that gaming naturally translates perception into action: players see a first-person view, interpret a scene, and translate that understanding into controller inputs. That alignment between perception and action is what the startup hopes to codify into robust, general agents.

From clips to capable agents: how the approach works

General Intuition’s research is centered on training agents through purely visual inputs, using consumer-style game footage rather than symbolic descriptions or real-world sensors. The agents watch what a human would see and move by following typical controller commands. The claim is that such a data stream can teach spatial-temporal reasoning without requiring robot-specific sensors or labels, enabling transfer to physical systems like robotic arms, drones, and autonomous vehicles.

Remarkably, the company says its models can understand environments they were not explicitly trained on and can predict plausible actions within them. This comes from the raw, uncurated realism of game clips, which surface both common patterns and unusual edge cases that are invaluable for training resilient agents. The early results suggest the models infer how to navigate space, avoid obstacles, and anticipate the consequences of actions, all from seeing the world through a gamer’s eyes.

Two milestones on the horizon

General Intuition has two major near-term goals. First, it intends to generate new simulated worlds that can be used to train additional agents, expanding the diversity and difficulty of training environments. Second, the team aims to enable autonomous navigation in completely unfamiliar physical settings, a capability crucial for real-world robotics and search-and-rescue operations.

Commercial strategy: distinct from other world models

Unlike some rivals that market their world models as universal engines for training, General Intuition treats world models as a foundational tool rather than a standalone product. The founders emphasize a pragmatic stance: their core value lies in applying spatial-temporal reasoning to concrete use cases, particularly in gaming—with bots and non-player characters that scale in difficulty and engagement. By avoiding direct competition with game developers, the team seeks to reduce copyright friction while still delivering sophisticated agents that improve player experience and retention.

Impact across industries

In gaming, the promise is bots and NPCs that adapt to players rather than following fixed scripts, potentially boosting engagement. But the broader vision extends to real-world systems. Human operators often rely on video-game-style interfaces to control drones and robots; the ability to interpret space and time from visual input could translate into more responsive, general agents. The humanitarian angle—supporting search-and-rescue missions in GPS-denied or chaotic environments—provides a powerful north star for funding and development, as de Witte’s background in humanitarian work informs the company’s mission.

Why this could matter for AGI

General Intuition frames spatial-temporal reasoning as a missing piece in the AI puzzle. While large language models (LLMs) advance textual understanding, they can lose crucial spatial context in translation to real-world actions. The team argues that combining visual-only input with controlled interaction, driven by human-like perception, could bridge a critical gap on the road toward artificial general intelligence.

Looking ahead

With substantial seed capital and a data-driven moat, General Intuition is betting that game-clip datasets can yield agents that not only perform tasks in digital worlds but also generalize to the physical realm. The coming months will reveal how effectively the approach scales, how it handles copyright considerations, and whether the model’s spatial reasoning generalizes to novel, real-world environments beyond gaming simulations.