Reardon et al. (2016): Decomposing the Mechanisms of College Sorting

reardon_2016_college_sorting.md · 1,623 words · 6 min read

Contents

Reardon et al. (2016): Decomposing the Mechanisms of College Sorting

Reardon et al. (2016): Decomposing the Mechanisms of College Sorting

The closest academic precedent to college-sim — a two-sided agent-based model of US four-year college sorting that turns each socioeconomic mechanism on and off in turn to read off how much of the SES gradient each one explains.

The research question

Why do students from high-resource families end up so heavily concentrated at selective colleges, while students from comparable-on-paper but lower-resource families do not? The descriptive fact is well-established: Reardon, Kasman, Klasik, and Baker (2016) open by noting that students in the top family-income decile enroll in top-tier colleges at roughly eight times the rate of students in the bottom decile. The interesting question is not whether sorting by socioeconomic status (SES) exists, but which channels produce it. Is the gradient mostly an academic-achievement gap that the admissions process simply preserves? Or is it driven by what happens during application — information, portfolio construction, essay polish, financial calculations — independent of the achievement that students bring to the table?

Reardon and his coauthors argue that you cannot answer that question with regression on observational data alone, because the mechanisms are deeply intertwined. A high-resource student tends to have higher tested achievement and better information and a longer application list and polished essays. Disentangling the marginal contribution of any one channel requires a model in which you can hold the others fixed.

Their answer is an agent-based simulation calibrated against two public datasets. The Education Longitudinal Study of 2002 (ELS:2002), maintained by NCES, follows a nationally representative cohort of US tenth-graders through college entry; it supplies the joint distribution of family resources, academic achievement, and enrollment outcomes that the model is fit against. IPEDS — the Integrated Postsecondary Education Data System — supplies the supply side: roughly 1,500 four-year colleges with their selectivity, capacity, and enrollment characteristics. Students and colleges are each modeled as decision-making agents, and the simulation iterates three stages — application, admission, and enrollment — until a market-clearing outcome is reached.

The methodological move that makes the paper canonical, and the reason it is the strongest academic precedent for college-sim, is what Reardon et al. do with that model: a systematic mechanism decomposition.

Method: turning mechanisms on and off

The authors identify five distinct, theoretically grounded channels through which family resources can drive sorting between colleges. Each channel is encoded as a switch in the simulation. By running the model with all five channels active and then re-running it with one channel "turned off" (set to its SES-neutral value), they can read the marginal contribution of each mechanism to the SES-by-college-quality gradient directly off the simulation output.

The five mechanisms are:

Achievement gap. High-resource students tend to score higher on standardized achievement measures, partly because of investments in K–12 schooling and home environment. Turning this off equalizes achievement across the SES distribution while preserving everything else.
Application enhancement. High-resource students invest in essay coaching, test prep, application strategy, and the cultivation of recommenders — investments that raise their apparent caliber to admissions readers above what their underlying achievement alone would predict. Turning this off removes the resource-linked premium on perceived application quality.
Information quality. High-resource students typically have noisier estimates of college quality than their lower-resource peers — they know which colleges exist, what fit means, what selectivity looks like, and how aid actually works. Turning this off gives every student perfect information about every college.
Portfolio size. High-resource students apply to more colleges, increasing the chance of any one acceptance and giving them more options at the enrollment stage. Turning this off equalizes portfolio size across SES.
Idiosyncratic valuation. Even with perfect information, students disagree about which college is best — they have personal taste shocks, family ties, geographic preferences, and social networks. Turning this off forces every student to agree on a single quality ranking.

Reardon et al. then run the simulation across 100 stochastic replicates per condition, and for each condition they read out the SES gradient: the difference in mean enrolled-college quality between the top and bottom resource quartiles. The shift in that gradient when a mechanism is removed is the mechanism's marginal contribution. They cross-validate with Latin hypercube sampling to verify that results are not artifacts of one specific parameter setting.

Headline results

The paper's central finding is more nuanced than either popular narrative ("it's all the achievement gap") or its competitor ("it's all institutional bias"). Reardon and colleagues find:

The resources-achievement link explains the largest share of sorting. When the SES-achievement correlation is removed, the SES gradient in college quality drops by roughly half compared to the all-mechanisms-on baseline. That makes achievement the single most powerful channel — but it is not the whole story.
Information quality has an outsized role for low-SES students in particular. When the information-quality gap is removed (every student gets the same accuracy of beliefs about colleges), the SES gradient shrinks substantially. The reproductions of the model find this is qualitatively the single most influential non-achievement mechanism, consistent with the paper's emphasis that information asymmetry is doing real work.
Idiosyncratic valuation is a moderator, not an amplifier. Counter-intuitively, when student taste shocks are removed and everyone agrees on a single ranking, sorting increases. Personal preferences mute sorting because they sometimes lead high-resource students to pick a less-selective college they happen to like.
Application enhancement and portfolio size matter, but less. Both have non-negligible but smaller effects in the decomposition.

The honest summary: sorting is a multi-mechanism phenomenon, not a single-cause one. Achievement is dominant, information is the most policy-actionable secondary lever, and idiosyncratic preference is a stabilizer rather than a driver of inequality.

What college-sim already does

college-sim implements all five of Reardon's mechanisms, in roughly the same conceptual locations he places them. The mapping is concrete enough to be worth a table.

Reardon mechanism	Where it lives in college-sim
Achievement gap	`generateStudents()` (sim.js:2237) and `correlatedGpaSat()` (sim.js:2208)
Application enhancement	`computeAdmissionScore()` (sim.js:3873)
Information quality	`buildCollegeLists()` (sim.js:2529)
Portfolio size	`buildCollegeLists()` (sim.js:2529)
Idiosyncratic valuation	`studentFinalDecisions()` (sim.js:4919)

The achievement gap is generated by generateStudents() in tandem with correlatedGpaSat() — the latter draws GPA and SAT as correlated normals (Cholesky-factored) within each archetype, with by-school distributional anchors that give students at well-resourced feeder high schools systematically higher academic indices.

Application enhancement lives inside computeAdmissionScore(), where essayQ and ec_quality enter the logit additively as multipliers on the student's apparent caliber. Students who can afford coaching and structured EC investment look stronger to readers than their academic numbers alone would predict — exactly the channel Reardon models.

Information quality and portfolio size both live in buildCollegeLists(). Utility is computed as prestige + fit + 5·log(P_admit) with a lognormal noise term K. The K term is the information-quality knob: a student with a tight K knows which colleges they can actually get into, while a student with a wide K is choosing semi-blindly. The same function caps the number of applications per student, which is the portfolio-size channel.

Idiosyncratic valuation lives in studentFinalDecisions(), where each student's final-choice utility includes a uniform personal-noise term, a Chetty-derived income-conditioned yield term, and an in-state preference. Together these break the assumption that every admitted student would rank colleges identically.

One scaling caveat that distinguishes college-sim from Reardon: MODEL_SCALE = 0.013 (sim.js:1451). college-sim runs roughly 4,000 modelled students against ~300,000 national applicants, with phantom-applicant scaling absorbing the difference. Reardon's ~1,500 colleges and ELS-cohort student count let him scale directly to the empirical distribution; college-sim trades that one-to-one realism for production-speed runs in the browser.

What Reardon does that we don't (yet)

The architectural gap is not the mechanisms themselves — those exist in college-sim — but the systematic on/off decomposition that turns the model into a measurement instrument. As of today, college-sim has no built-in diagnostic that says "freeze first-gen status to neutral, re-run, and report the shift in the SES-by-tier acceptance gradient."

This is a near-zero-cost extension. Each mechanism is already a single multiplier or additive term. The roadmap entry is a COUNTERFACTUAL configuration object that the run loop checks before applying each mechanism, and a results pane that diff's the SES gradient against an all-mechanisms-on baseline. The forward-link for that work is the CAS/ABM extensions roadmap, where mechanism-shutoff is the Tier-1 (low-cost, high-leverage) addition. Once it ships, college-sim can speak in Reardon's own currency: "removing the legacy hook closes X percentage points of the Tier-1 gap; removing the information-quality gap closes Y."

Run the reproduction yourself

A simplified Node.js reproduction of Reardon's mechanism-decomposition lives at /research2/cas-abm-references/reproductions/01-reardon-college-sorting/README.md, with a top-level overview of all six matching-market reproductions at /research2/cas-abm-references/reproductions/SUMMARY.md. The reproduction is a single dependency-free main.js that runs in under a second and uses 5,000 synthetic students against 50 synthetic colleges. It does not reproduce Reardon's absolute numbers — those depend on real ELS:2002 and IPEDS data — but it does reproduce the qualitative shape of his findings: the SES-by-college-quality gradient is positive in every condition, the achievement gap is the single most powerful mechanism, information quality is the next most powerful and has an outsized effect on low-SES students, and idiosyncratic valuation is a moderator that widens the gap when removed. The reproduction is intended as a worked example mineable for the college-sim counterfactual extension, not as a full replication of the Stata original on CoMSES.

Citation

Reardon, S. F., Kasman, M., Klasik, D., & Baker, R. (2016). Agent-Based Simulation Models of the College Sorting Process. Journal of Artificial Societies and Social Simulation, 19(1) 8. https://www.jasss.org/19/1/8.html. Stata replication code archived at the CoMSES Net computational model library (formerly OpenABM).