DoubleFives launches in:0d 0h 0m 0s
← Back to Games

Research & Methodology

The Science
Behind the Games

Behavioral economics. Reinforcement learning. DSGE modeling. Cognitive psychology. UI research. These are not decorative influences — they are architectural constraints.

Behavioral EconomicsPPO / Reinforcement LearningDSGE ModelingCognitive Load Theory

01

Abstract

This document describes the methodological foundations underlying Screwcap Holdings' portfolio of browser-based games. Our design philosophy is predicated on the proposition that entertainment and education are not in tension — that genuinely calibrated challenge, grounded in real economic and behavioral models, produces both higher engagement and measurable transfer of skill.

The Screwcap stack draws from five disciplines: reinforcement learning (DoubleFives PPO agent), DSGE modeling (TheChair, FlyMacroPilot), behavioral economics (Gold Digger), cognitive psychology (difficulty scaling), and UI/UX research (interface friction). These are not marketing claims. They are architectural constraints embedded in how each game is built.

02

AI Methodology: DoubleFives PPO Agent

The AI powering DoubleFives is a Proximal Policy Optimization (PPO) agent trained via deep reinforcement learning in a four-player adversarial domino environment. PPO belongs to the same algorithm family as OpenAI Five and the predecessor approaches to AlphaGo — it is among the most battle-tested policy gradient methods in the field.

The agent uses an Actor-Critic architecture with a 3-layer MLP policy network. Seven distinct AI personalities were developed by training separate policy heads with modified reward functions — introducing different weightings on aggressive play, defensive blocking, and partner cooperation. These produce genuinely distinct behavioral profiles, not cosmetic variation.

Model weights are exported via ONNXand run entirely in-browser. No game moves are transmitted to a server. The AI runs locally on the player's device.

03

Economic Model Design: TheChair & FlyMacroPilot

Both TheChair and FlyMacroPilot are built on a Dynamic Stochastic General Equilibrium (DSGE) model — the workhorse framework of modern monetary policy analysis, used by the Federal Reserve, ECB, and major academic institutions. FlyMacroPilot implements three historically calibrated scenarios — 1929, 2008, and the Volcker Shock — with parameters drawn from FRED.

Both games implement the Taylor Rule as a policy benchmark:

it = r* + π* + 1.5(πt − π*) + 0.5(yt − ȳt)

04

Behavioral Economics in Game Design

Gold Digger's prediction market framing exploits the well-documented gap between objective probability and subjective probability weighting identified by Kahneman and Tversky. Players systematically overweight small probabilities — the game makes this miscalibration visible through immediate feedback. The pedagogical goal is probability calibration.

Sister Wendy's three difficulty modes are designed according to Csikszentmihalyi's Flow theory — maintaining challenge at the boundary between anxiety and boredom. "Merciless" is the optimal-play policy head; "Forgiving" introduces deliberate suboptimal play to keep novice players in flow.

05

ELO Rating Methodology

The DoubleFives AI is rated using a modified Elo system adapted for four-player adversarial play. The standard formula:

EA = 1 / (1 + 10(RB − RA)/400)

Our implementation uses K=32 and bootstraps the ladder from 600 (novice human). The current AI ELO of 1,120 represents a strong intermediate level — challenging but beatable. This is intentional: we want players to feel they can improve, because they can.

06

UI/UX Design Principles

Screwcap interfaces minimize extraneous cognitive load while preserving germane cognitive load — the effortful engagement required to learn the underlying model. A player struggling with TheChair's rate decisions should be struggling with the macroeconomics, not the UI.

07

Raw Training Metrics

The following log reflects the DoubleFives PPO training run as of the latest checkpoint on TEDDY (RTX 3090).

● ● ●
// DoubleFives PPO — Training Summary
algorithmProximal Policy Optimization (PPO v12)
total_games_played586,000,000+
current_elo1,120 ↑ climbing
gpuNVIDIA RTX 3090 (24GB VRAM)
frameworkPyTorch 2.x + Stable-Baselines3
architectureActor-Critic, 3-layer MLP (256→128→64)
personalities7 distinct policy heads
deployment_formatONNX (client-side, no latency)
statusTraining in progress — Phase 2 underway

08

Citing Screwcap Research

If you reference Screwcap's AI or economic modeling work in academic or journalistic contexts:

@misc{screwcap2026research,
  author    = {Screwcap Holdings LLC},
  title     = {Research & Methodology: Behavioral Game Design
               and PPO-Trained AI Opponents},
  year      = {2026},
  url       = {https://screwcap.games/research},
  note      = {Technical report. Screwcap Holdings LLC, Wisconsin, USA.}
}

For collaboration inquiries or classroom use: play@screwcapholdings.com