A Counterfactual Roadmap for the College-Sim Engine

cas_abm_extensions_roadmap.md · 1,836 words · 7 min read

Contents

A Counterfactual Roadmap for the College-Sim Engine

A Counterfactual Roadmap for the College-Sim Engine

Three near-zero-cost narrative outputs that turn the simulator into a literature-grounded counterfactual engine.

The gap

The marketing site at college-monte-carlo.com/research2/ currently ships roughly fifty narrative pages on college admissions. Every one of them derives its numbers from some external source: Common Data Set filings, IPEDS, federal Scorecard data, NACAC reports, the Chetty Opportunity Insights tables, or specific peer-reviewed papers we cite directly. None of them derive numbers from runs of the simulator that powers the rest of the site.

That is a missed opportunity, and a slightly strange one. The whole point of an agent-based admissions simulator is that it can answer questions no real CDS can answer: what would happen if a particular mechanism were turned off? A real Common Data Set tells you Princeton's acceptance rate. It cannot tell you what Princeton's admitted-class academic mean would look like with legacy preference set to zero. Only a calibrated simulation can produce that counterfactual, and we have a calibrated simulation sitting right here.

This roadmap is about closing that gap with the cheapest three extensions available. None of them require new model mechanics. None of them require re-running calibration. All three are translations of methodological moves that already exist in the matching-markets ABM literature, applied to college-sim's existing classic-mode admission engine. Each produces a publishable narrative output of the form "with X turned off, outcome Y shifts by Z."

We rank the three by source paper. The full comparison of college-sim against the broader CAS/ABM literature lives in /research2/MATCHING-MARKETS-COMPARISON.md, which surveys eight candidate extensions; the three below are the top three on a cost-versus-leverage basis.

Extension 1 — Reardon mechanism shutoff

The methodological move that made Reardon, Kasman, Klasik, and Baker (2016) the canonical college-sorting ABM was deceptively simple: turn each of five SES-correlated mechanisms on or off, one at a time, and read off how much of the SES-by-tier enrollment gradient each mechanism explains. The five Reardon mechanisms are the achievement gap, application enhancement, information quality, portfolio size, and idiosyncratic valuation.

College-sim already encodes all five. The achievement gap is in our income-correlated SAT and GPA distributions. Application enhancement is the EC-quality and essay-base channels that vary with parental education. Information quality is in the demonstrated-interest weighting that disadvantages students from low-counseling environments. Portfolio size is the per-student application-count distribution. Idiosyncratic valuation is the Gaussian noise added to each student-college fit score. We have the mechanisms; what we lack is the page that turns each one off and reports the result.

The output will be a Markdown table per mechanism, structured roughly like this:

Mechanism nullified	Tier-1 enrollment share, Q1 students	Tier-1 share, Q4 students	Gap closed
(baseline)	a	b	—
Achievement gap	a'	b'	Δ
Application enhancement	a''	b''	Δ'
Information quality	a'''	b'''	Δ''

Numbers stay blank in this roadmap; they will populate when the harness runs. Because our calibration deltas were fit with mechanisms on, we will publish each shutoff in two variants — a calibrated version (what production users see) and a "bare" version with deltas removed (a sensitivity check that shows what the model would say without per-college tuning). The page will document this prominently as a methodology note.

The deeper background on Reardon's framework lives in our literature review at /research2/reardon_2016_college_sorting.html.

Extension 2 — Dickerson price-of-fairness sweep

Dickerson, Procaccia, and Sandholm (2014) introduced the price of fairness in the context of kidney exchange. The framing is portable: any allocation rule that includes a fairness adjustment incurs some cost in aggregate efficiency, and that cost is measurable. In their original setting, the price-of-fairness number on UNOS pilot data turned out to be under five percent — a meaningful empirical result that reframed an ethics debate as a quantitative tradeoff.

College admissions has fairness adjustments too — they're called hooks. Legacy, recruited athlete, donor, first-generation, Pell-eligible, URM, Asian-American (in the post-SFFA framing), geographic diversity, gender balance, and major oversubscription all enter the admission logit as multipliers. Each of them, by construction, redirects seats toward students who would not have received them on academic-index grounds alone. The question Dickerson's framing makes available to us is: what is the academic-mean cost of each hook, measured in admitted-class index points?

The sweep is mechanical. For each of the ten hook multipliers, scale its strength from 1.0 (current calibration) down to 0.0 (full shutoff), holding all others fixed. Per scenario, report the change in three quantities: admitted-class mean academic index, admitted-class composition (URM share, low-income share, first-gen share), and overall shutout rate. The output is a per-hook table:

Hook nullified	Δ admitted-class mean SAT	Δ URM share	Δ first-gen share	Class fill
Legacy	+Δ₁	…	…	…
Recruited athlete	+Δ₂	…	…	…
First-generation	…	…	…	…
(and so on for ten hooks)

This produces the kind of headline claim that journalists and policy researchers ask us for and we currently cannot supply with our own numbers: "Removing legacy preference at this peer set raises admitted-class mean SAT by N points while leaving URM share unchanged." Whether the number is large or small is itself the point — Dickerson's contribution was that the price of fairness, once measured, was usually smaller than rhetoric implied.

The fuller methodology context is at /research2/dickerson_2014_price_of_fairness.html.

Extension 3 — Mennle-Seuken manipulation gain

Mennle and Seuken (2014) compared deferred-acceptance, classic Boston, and adaptive Boston as school-choice mechanisms, and made manipulation gain — the welfare a strategic agent extracts over a truthful agent — into a measurable property of the mechanism rather than a hand-waved concern. Their headline result was that adaptive Boston cuts manipulation gain by roughly half relative to classic Boston while keeping welfare close.

College-sim's mechanism is more complicated than the canonical DA-versus-Boston comparison: decentralized, six-round, with binding ED. But the Mennle-Seuken question — who benefits how much from strategic play? — translates cleanly. We can simulate the same student population under four different strategy regimes:

Truthful: each student applies in the round and to the schools their honest preference ordering implies.
ED-aggressive: each student applies ED to their top reach, accepting binding commitment for the boost.
RD-safeties: each student declines ED, applies broadly in RD, optimizes for any-acceptance over best-acceptance.
Oracle: post-hoc best of the above three per student — an envelope, not a literal best response.

For every (archetype × structural-position) cell — six archetypes by four structural positions, twenty-four cells — we report the median committed tier under each regime, plus the gain over truthful. The output is a heatmap of strategy sensitivity: which cells benefit most from gaming ED, which cells lose, and where the oracle envelope sits above truthful play.

The point is not to recommend strategies. The point is to show, with our own numbers, which student profiles the current decentralized US mechanism rewards for sophistication. That is the closest thing we can produce to the kind of fairness diagnostic the school-choice literature has built around DA.

We are framing this as ex-post strategy-sensitivity with an oracle envelope, not literal Mennle-Seuken best-response — full best-response is undefined under stochastic admissions and is out of scope. The deeper context is at /research2/mennle_seuken_2014_da_vs_boston.html.

How they wire in (lightly technical)

All three extensions share a single new configuration object on the simulator side. A COUNTERFACTUAL object carries a mechanisms map ({achievement_gap: 1.0, app_enhancement: 1.0, info_quality: 1.0, portfolio_size: 1.0, idiosyncratic_valuation: 1.0}), a hooks map ({legacy: 1.0, athlete: 1.0, donor: 1.0, first_gen: 1.0, ...}), and a strategy_override field for the manipulation-gain regimes. Each gating site in sim.js multiplies its current effect by the relevant strength — 1.0 is a no-op, 0.0 is full shutoff, intermediate values interpolate geometrically on log-multipliers.

These extensions are classic-mode only. Hook multipliers are already gated by a SIM_MODE !== 'personalized' check in the personalized-mode admission path; the harness will hardcode classic mode and document the constraint on each output page. Personalized-mode counterfactuals would need a separate plumbing pass and are out of scope here.

A new harness research/build_counterfactuals.cjs, mirroring the existing research/calibrate_v*.cjs pattern, will pre-compute results across a fixed seed list per scenario for paired-difference reproducibility. Output goes to JSON files under data/counterfactuals/. The two Reardon and Dickerson pages render as static Markdown with inline scripts that fetch the JSON and draw a small table; the Mennle-Seuken page becomes a small D3 dashboard for the heatmap.

Why these three first

The three extensions share three properties. First, zero model changes: every gating site is a multiplication against an existing effect, and the strategy override is a single hook at the top of the list-building function. Second, zero calibration redo: the same thresholdDelta values that ship in production today serve all three; the methodology note on the Reardon page documents the calibrated-versus-bare distinction so readers know what they are looking at. Third, directly publishable narrative outputs: each extension produces a small, well-posed table or heatmap with a self-contained interpretation.

That combination — no plumbing risk, no calibration risk, immediate narrative value — is rare. Most simulator extensions cost more than they pay back. These three pay back the marketing site's missing-counterfactual gap at roughly the cost of writing three pages.

What's not on this roadmap (and why)

The longer extension list includes roughly five more candidates of varying ambition. Three are worth flagging here because we expect questions about them.

A reinforcement-learning admissions-policy designer — wrapping the simulator as a Gym environment and letting an RL agent search over per-college threshold-and-yield trajectories — is research-grade work, weeks not days, and requires non-trivial new infrastructure. Background at /research2/santos_2024_school_segregation.html.

Replacing the v8/v9 fixed-point calibration loop with neural-ratio-estimation simulation-based inference, returning full Bayesian posteriors over each thresholdDelta, is methodologically the strongest next move on calibration. It is also a model change in spirit: the chancing tool would gain credibility intervals, and the production pipeline would gain a Python dependency. Background at /research2/dignum_2025_school_choice.html.

Adaptive admissions-deadline timing — using deep RL to choose when to clear each round, rather than treating ED, EA, EDII, and RD as fixed — is the ride-hailing-derived idea from Bao et al. (2025). It is interesting and would require restructuring the round loop. Background at /research2/bao_2025_timing_the_match.html.

All three are deferred to research-grade future work. The Tier-1 three above ship first.

Status

Design complete as of 2026-05. Implementation has not yet begun. The three output pages and the underlying JSON artifacts will appear under /research2/ and /data/counterfactuals/ once the harness lands. To be notified when the first counterfactual numbers go live, subscribe via the homepage signup form.