What is and isn’t Google Genie: A Week After the Armchair Analysis
We have all seen the stories, the latest World Model demo from Google sent game publisher and game engine stocks tumbling with Unity (NYSE: U) as the poster child of impact, down 35% since the public reveal. You can prompt Project Genie “I’m a caterpillar storming the beaches of Normandy during WWII” and something will be generated to that effect. It’s the end of gaming. It’s GTA VII before we get GTA VI.
A week and thousands of hot-take LinkedIn posts later, it’s time to really analyze what World Models will or won’t change in the video game industry. Are world models real-time frame generation videos cosplaying as games? Is this the future of entertainment? Why are the smartest investors pulling out of game stocks?
Project Genie’s introduction truly creates “Wow” moments
What are World Models and Project Genie?
World Models are video generation models trained to simulate a consistent world. They generate videos frame-by-frame of a simulated environment and can be driven by input from the user. The original use case for World Models was training automations and robotics. Need an autonomous vehicle model to train on specific dangerous situations? World Model. Need robots to learn to solve whatever it is robots are doing to overthrow humanity? World Model.
World Models simulate consistent virtual environments by producing frame-by-frame videos that can respond to input or change.
Project Genie is the latest public demo from Google’s World Model: Genie 3. Genie is described as a world generating model that provides interactive simulation. Genie’s main purpose is to train AI Agents to navigate 3D worlds. Project Genie allows users to generate and explore virtual worlds similar to a video game. It has 3 phases:
World sketching Use text or images to create a world, a character, and describe how you want to explore the world.
“I want to be a banana exploring a frozen tundra gliding on ice skates in 3rd person perspective.”World exploration Navigate the world in real-time using predefined controls: w, a, s, d or arrow keys for movement and space for jumping/actions.
World remixing Build on top of your generated world or curated worlds through prompts that allow you to modify the character and environment.
Project Genie is a publicly facing interactive demo that allows users to use text or image prompts to create a character in a virtual world and use movement controls to explore the world.
Are game studios and game engines cooked?
No, not anytime soon. World Models are currently real-time video generation models that need carefully curated training to present reactions to player inputs that resemble video games. World Models have a few obstacles in the way of broad usefulness:
Insanely expensive to run for generating and rendering the simulation
Have to be trained on (likely) millions of hours of video game footage
Can only maintain the simulated world for minutes before losing consistency or exceeding the context window
You can recreate BotW for a $250 Google AI Ultra subscription
Game studios will continue to leverage whatever technology is optimal to craft amazing worlds. Players prefer to leave it to expert game developers to create the latest and greatest virtual worlds to explore and inhabit. The goal posts for video game greatness are constantly moving. The second the models catch up, that becomes the baseline and game makers will find a new way to differentiate their creations.
Game engines have more to benefit from these models than lose. If World Models do become a cost effective way to produce visuals, game engines lower one of the largest barriers to getting aspiring game developers to produce video games, wildly increasing their potential customer base. World Models could be used to speed up content creation, imagine training a local model on a preset of environment rules the art team has carefully curated and let the World Model generate 50 options for designers to explore and determine their favorite.
Genie is “interactive video,” not “interactive systems.” It will accelerate prototyping and content workflows before it replaces engines.
Who’s cooked?
Honestly, if anyone is cooked it is investors in this technology today. Right now I am stealing Scott Galloway’s prediction that AI companies today will struggle to hold significant margins in the future based on model performance alone. This means if these World Models do become a big deal, every major foundational model player will be able to replicate that success and create a race to the bottom for customer pricing. If we could look behind the wizard’s curtain we’d see the unit economics make little sense long term, cost per minute of interaction vs subscription price paid for access.
Google’s Project Genie demo reminds me of OpenAI’s Sora video service. It requires an expensive subscription, the content is quickly turning into forgettable memes, and the service likely loses more money the more users engage with it.
There doesn’t seem to be a viable path for sustained margins and profitability in utilizing World Models for consumer entertainment.
What I’m excited about World Models?
Enough bashing Project Genie and World Models, what is exciting about them?
World Models won’t displace video games in the short term but can create amazing interactive experiences. My immediate thought is bespoke walking simulators for music. Instead of using text or images as the world creation context, use music. Create a playlist and explore a world that changes based on the key, tempo, rhythm, melody, and timbre of the song.
Another use is rapid prototyping. Exploring art direction, perspective, thematic wrapper, etc. quicker than ever before through modifying a preset world based upon a game in pre-production. Turn reference images or mood boards into real-time visuals for immediate feedback.
I’m also excited about solving the problems of finite context windows and lack of persistent states. Currently, World Models suffer from:
Compounding errors or drift, when a small mistake continues to expand and ruin a simulated world
Limited memory, where high processing costs drive context windows to be limited in relation to other models
Lossy tokenization, the detailed visual and structural information are lost in compression
Lack of persistent state, models are dynamically interpreting the world not relying on hard coded truths
Imagine being able to have a World Model that can efficiently read rules or world properties in some sort of RAG for World Models, allowing you to enforce rules, have the world utilize game systems, and create persistent states such as a narrative or player progression. This is the true unlock to breaking away from dream-like experiences to new forms of interactive medium.
Again, Genie is currently “interactive video,” not “interactive systems.” It will accelerate prototyping and content workflows before it replaces engines.





