Every March, everything seems to converge at once—Nvidia Graphics Processing Unit Tech Conference (GTC), Game Developers Conference (GDC), ETH SF, and International Women’s Month. It’s the kind of overlap that makes you feel like you should be everywhere, seeing everything, keeping up with it all.
This year, I made a different call.
Instead of running between conferences, I stayed in—heads down building my app and refining something I’ve been thinking a lot about lately: a personal Life Operating System. Not only productivity, but how I structure time, attention, inputs, and outputs—how I actually play my days.
And ironically, stepping back made one thing clearer than ever: a lot of what we’re watching unfold in AI started with games.
Not only chess or Go—but environments, simulations, feedback loops. Systems where intelligence emerges through play.
So instead of another conference recap, this is the post I’ve been wanting to write:
GameAI, the simulated multiverse, and how ideas like self-play and reinforcement learning connect all the way to Game Theory Optimal (GTO) poker—and even how I think about building my own systems for life.
Because once you see it, it’s hard to unsee:
We didn’t only teach AI how to play games.
We used games to understand intelligence itself.
And maybe that says something about the game we’re in, too.
In This Post
PART 1: Why GameAI? — The Origin Story of Modern AI
PART 2: The Simulated Multiverse
PART 3: Reinforcement Learning as Self-Development
PART 4: Playing Game Theory Optimal
BONUS: Poker, Strategy, and Decision-Making Resources
PART 1: WHY GAMEAI? — The Origin Story of Modern AI
“How Did AI Come To Be Today?”
Outside of the founder story or the very technical jargon of talking about the evolution of the paper “Attention is All You Need,” and the transformer algorithm, let’s talk about OpenAI Gym and its predecessors in GameAI to understand the early foundations of AI.
OpenAI Gym (2018) as the foundation: AI learned intelligence by playing games.
Deep Blue (1997): search-heavy symbolic engineering milestone, not modern deep RL. Read the 1999 paper.
AlphaZero (2017): showed chess could be mastered from rules alone with self-play RL and neural search, without handcrafted chess knowledge. Read the 2017 paper.
Go
AlphaGo (2016): the breakthrough that made deep RL famous to the broader world. Read the Nature article.
AlphaGo Zero (2017): removed reliance on human expert data; self-play became the headline method. Read the Nature article.
StarCraft
AlphaStar (2019): pushed gameAI into complex real-time multi-agent strategy. Oriol Vinyals of Google Deepmind gave a great lecture at UC Berkeley AI course here. Read the Nature article.
Poker
Libratus (2017): major heads-up imperfect-information breakthrough.
ALE (2012/2013): standardized Atari as a benchmark. Read paper.
OpenAI Gym (2016): standardized the environment interface used across RL research. Read paper and old OpenAI blog. This was all the rage when it first came out—and a huge breakthrough for me.
Ziegler et al. (2019): preference-based fine-tuning for language.
Ouyang et al. (2022): InstructGPT popularizes RLHF in LLM alignment. Read paper here.
More recommended reading by these academic heavy weights: check out my quick wrap up of last year’s UC Berkeley Agentic AI course featuring guest lectures by Oriol Vinyals (creator of AlphaStar a.k.a. AI beating professional gamers playing one of my all time favorite games, StarCraft -SC), my ultimate favorite AI researcher on poker, Noam Brown (known for building AI beating professional poker players, Pluribus and Libratus in 2017 and 2019), my new favorite researcher, Professor Peter Stone (head of Sony AI), with his work on Gran Turismo and using gameAI simulators to beat professional racecar drivers (see his lecture here) and my post on the QCon 2024 talk, where I break down spatial intelligence and how this all relates.
So what does all of this have to do with the idea of a simulated reality?
PART 2: THE SIMULATED MULTIVERSE
When I first started writing my Creating Augmented and Virtual Realities: Theory and Practice Next-Generation Spatial Computing book in 2018, I wanted the book to cover two key areas: computer vision, AI, data visualization (my favorite topics), and cross-platform development (programming apps across different Head Mounted Displays - HMDs). At the time, the head of Deep Learning Research at Unity, Nicolas Meauleau and Arthur Juliani (Deep Learning Engineer) presented me with their chapter idea on AI agents within a VR setting. This graphic below introduces their framework for understanding AI agents within an XR environment. They show the architecture for what they call the “decision layer,” which plans a behavior and takes various actions (namely motion), both by AI and Non-Player-Characters (NPCs) in XR applications and video games.
Figure 10-1 from the chapter of former Deep Learning team at Unity, Nicolas Meuleau and Arthur Juliani “Character AI and Behaviors” of our O’Reilly Media Book, Creating Augmented and Virtual Realities: Theory and Practice for Next-Generation Spatial Computing
If you know me, you know I’ve been obsessed with the idea that life is a game—long before AI made that framing mainstream. Years ago, I remember a conversation with DJ Q-Bert talking about how life was like a video game, very similar to the entire framework of the book, Reality is Broken: Why Games Makes Us Better and How They Can Change The World by Jane McGonigal.
The “meta” realization: if we are characters in a game, and OpenAI Gym showed us AI can level up by playing — what does that mean for how we play our own lives?
PART 3: REINFORCEMENT LEARNING AS SELF-DEVELOPMENT
“Upgrading Your Engine”
Part of the reason I named this Substack—and my broader brand—Creating Your Reality (aside from referencing my first technical book, Creating Augmented and Virtual Realities) is that I kept encountering the idea everywhere: you create your reality.
“We create our reality by how we choose to act and how we choose to respond to things outside ourselves from within ourselves.”
This idea shows up across modern creator culture as well. TikTok creator @cherrylimesoda frames it in a more “meta” way: if we are all characters playing a game—echoing early environments like OpenAI Gym—then we are constantly updating our own “software” through self-belief, self-development, and self-talk. In other words, a form of self-play.
A similar framework comes from Courtney Alexis Cho:
The subconscious stores self-talk as “reality data” → looped input → looped outcome.
She puts it simply:
“Reality is the PDF. Your identity is the file. If you don’t like what’s being printed, you don’t throw away the printer—you change the document.”
In this framing:
Our identity = subconscious inputs
Our reality = the output
The Reinforcement Learning Parallel
What Cho is describing maps surprisingly well to reinforcement learning (RL).
RL—especially in its modern form with Reinforcement Learning with Human Feedback (RLHF)—is about optimizing behavior through repeated signals and feedback loops.
Now apply that to ourselves:
Our self-talk = feedback signal
Our habits = training data
Our outcomes = model outputs
Reframed:
What we repeat, our system optimizes for.
Reprogramming the System
Across these philosophies, the mechanism is consistent:
Imagination rewrites internal state
Emotion acts as the programming language of the subconscious
Repeated patterns reinforce behavior
We can overwrite the code
This isn’t about wishful thinking—it’s about changing our information architecture.
Overriding negative loops isn’t “delusion.”
It’s updating the system.
The Deeper Insight
There’s an important distinction here:
The subconscious isn’t looking for proof
It’s looking for patterns
And yet, we live in a world governed by systems, math, and proofs.
So in a strange way:
Our beliefs become the “proof” we submit to our own system.
The Synthesis
Both Bruce Lee and Courtney Cho are pointing to the same core idea:
We are not just reacting to reality—we are participating in shaping it.
If life behaves like a game, then:
We are the player
We are the model
And we are also the training loop
To level up, we don’t just try harder.
We:
design better feedback loops
choose better inputs
and take actions that reinforce the identity we want to become
Because in the end, like any learning system—
we become what we repeatedly optimize for.
PART 4: PLAYING GAME THEORY OPTIMAL (GTO)
“From AI to the Poker Table”
Many people swear by this technique — one of my favorite poker players to watch is Randy Lew (known as @nananako on X/Twitter) who plays 24 poker tables online simultaneously! This is pattern matching (for context, managing 6 codebases and 3–7 foundation models at a time is a lot for me personally — and definitely a lot more cognitive overload in context-switching than most can handle).
Through AI and our use of pattern matching, we have been able to simplify and optimize our software engineering (app development and programming) workloads by offloading cognitive load to a machine — the same way we can distill GTO instincts for a human down to a few key actions, enabling them to monitor and judge gameplay across a number of simultaneously running games.
The deeper point: GTO isn’t just poker strategy. It’s a decision-making framework for life.
Erin Jerri Malonzo Pañgilinan is a software engineer and computational designer. She is the lead author of the O’Reilly Media book Creating Augmented and Virtual Realities: Theory and Practice for Next-Generation Spatial Computing, which debuted as the #1 book in Amazon’s Game Programming and has been translated into Chinese and Korean with distribution in more than 42 countries.
She was previously a fellow in the University of San Francisco (USF) Data Institute’s Deep Learning Program (2017–2018) and the inaugural Data Ethics cohort (2020) through fast.ai.
She is currently working on new books and software applications exploring the intersection of AI, spatial computing/XR, and web3.
Erin earned her BA from the University of California, Berkeley and is a proud Silicon Valley native born and raised.