Research & Methodology
The Science
Behind the Games
Behavioral economics. Reinforcement learning. DSGE modeling. Cognitive psychology. UI research. These are not decorative influences — they are architectural constraints.
01
Abstract
This document describes the methodological foundations underlying Screwcap Holdings' portfolio of browser-based games. Our design philosophy is predicated on the proposition that entertainment and education are not in tension — that genuinely calibrated challenge, grounded in real economic and behavioral models, produces both higher engagement and measurable transfer of skill.
The Screwcap stack draws from five disciplines: reinforcement learning (DoubleFives PPO agent), DSGE modeling (TheChair, FlyMacroPilot), behavioral economics (Gold Digger), cognitive psychology (difficulty scaling), and UI/UX research (interface friction). These are not marketing claims. They are architectural constraints embedded in how each game is built.
02
AI Methodology: DoubleFives PPO Agent
The AI powering DoubleFives is a Proximal Policy Optimization (PPO) agent trained via deep reinforcement learning in a four-player adversarial domino environment. PPO belongs to the same algorithm family as OpenAI Five and the predecessor approaches to AlphaGo — it is among the most battle-tested policy gradient methods in the field.
The agent uses an Actor-Critic architecture with a 3-layer MLP policy network. Seven distinct AI personalities were developed by training separate policy heads with modified reward functions — introducing different weightings on aggressive play, defensive blocking, and partner cooperation. These produce genuinely distinct behavioral profiles, not cosmetic variation.
Model weights are exported via ONNXand run entirely in-browser. No game moves are transmitted to a server. The AI runs locally on the player's device.
Primary Citations
- [→]Schulman et al. (2017). "Proximal Policy Optimization Algorithms." arXiv:1707.06347. — The foundational PPO paper. Our implementation follows the clipped surrogate objective.
- [→]OpenAI et al. (2019). "Dota 2 with Large Scale Deep Reinforcement Learning." arXiv:1912.06680. — OpenAI Five PPO scaling.
- [→]Vinyals et al. (2019). "Grandmaster level in StarCraft II using multi-agent reinforcement learning." Nature, 575, 350–354.
03
Economic Model Design: TheChair & FlyMacroPilot
Both TheChair and FlyMacroPilot are built on a Dynamic Stochastic General Equilibrium (DSGE) model — the workhorse framework of modern monetary policy analysis, used by the Federal Reserve, ECB, and major academic institutions. FlyMacroPilot implements three historically calibrated scenarios — 1929, 2008, and the Volcker Shock — with parameters drawn from FRED.
Both games implement the Taylor Rule as a policy benchmark:
Primary Citations
- [→]Smets, F. & Wouters, R. (2007). "Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach." American Economic Review, 97(3), 586–606.
- [→]Taylor, J.B. (1993). "Discretion versus policy rules in practice." Carnegie-Rochester Conference Series, 39, 195–214. — The Taylor Rule.
- [→]Galí, J. & Gertler, M. (1999). "Inflation dynamics: A structural econometric analysis." Journal of Monetary Economics, 44(2), 195–222.
04
Behavioral Economics in Game Design
Gold Digger's prediction market framing exploits the well-documented gap between objective probability and subjective probability weighting identified by Kahneman and Tversky. Players systematically overweight small probabilities — the game makes this miscalibration visible through immediate feedback. The pedagogical goal is probability calibration.
Sister Wendy's three difficulty modes are designed according to Csikszentmihalyi's Flow theory — maintaining challenge at the boundary between anxiety and boredom. "Merciless" is the optimal-play policy head; "Forgiving" introduces deliberate suboptimal play to keep novice players in flow.
Primary Citations
- [→]Kahneman, D. & Tversky, A. (1979). "Prospect Theory: An Analysis of Decision under Risk." Econometrica, 47(2), 263–292.
- [→]Thaler, R.H. & Sunstein, C.R. (2008). Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press.
- [→]Csikszentmihalyi, M. (1990). Flow: The Psychology of Optimal Experience. Harper & Row.
05
ELO Rating Methodology
The DoubleFives AI is rated using a modified Elo system adapted for four-player adversarial play. The standard formula:
Our implementation uses K=32 and bootstraps the ladder from 600 (novice human). The current AI ELO of 1,120 represents a strong intermediate level — challenging but beatable. This is intentional: we want players to feel they can improve, because they can.
06
UI/UX Design Principles
Screwcap interfaces minimize extraneous cognitive load while preserving germane cognitive load — the effortful engagement required to learn the underlying model. A player struggling with TheChair's rate decisions should be struggling with the macroeconomics, not the UI.
Primary Citations
- [→]Nielsen, J. (1994). "10 Usability Heuristics for User Interface Design." Nielsen Norman Group.
- [→]Sweller, J. (1988). "Cognitive Load During Problem Solving: Effects on Learning." Cognitive Science, 12(2), 257–285.
- [→]Fitts, P.M. (1954). "The information capacity of the human motor system." Journal of Experimental Psychology, 47(6), 381–391.
07
Raw Training Metrics
The following log reflects the DoubleFives PPO training run as of the latest checkpoint on TEDDY (RTX 3090).
08
Citing Screwcap Research
If you reference Screwcap's AI or economic modeling work in academic or journalistic contexts:
@misc{screwcap2026research,
author = {Screwcap Holdings LLC},
title = {Research & Methodology: Behavioral Game Design
and PPO-Trained AI Opponents},
year = {2026},
url = {https://screwcap.games/research},
note = {Technical report. Screwcap Holdings LLC, Wisconsin, USA.}
}For collaboration inquiries or classroom use: play@screwcapholdings.com