This is the core technical apex of Course 3. In the past, politicians claimed their highly skewed maps were just the natural result of geographic constraints—that Democrats naturally packed themselves into cities ("political geography"), and thus proportional representation was geometrically impossible. Ensemble Analysis destroyed that defense. Using Markov Chain Monte Carlo (MCMC) algorithms, mathematicians can program a computer to randomly generate 100,000 completely neutral maps that follow all state rules. If the politician's map is more extreme than 99,999 of the random maps, the "accident of geography" defense collapses. It is mathematical proof of intent to rig.
In This Module
- Covers: Markov Chain Monte Carlo algorithms, the Recombination (ReCom) method, and building a mathematical baseline of fair districting outcomes.
- Why it matters: This is currently the most devastating weapon in the civil rights data scientist's arsenal. If you do not understand ensemble modeling, you cannot lead a major redistricting challenge.
- After this module, the reader can: Understand how algorithms systematically swap voting precincts to generate massive distributions of alternate universes, mathematically cornering gerrymandered outcomes.
Reading List
Conceptual
-
A high-level explanation of the baseline problem. Duchin argues that you cannot simply compare a map to strict proportionality (e.g., "50% vote should equal 50% seats") because physical geography limits what is possible. Instead, you must compare a map to the universe of possible maps for that specific state. Thus, the algorithm acts as the baseline for fairness.
-
An accessible summary from the Duke mathematics team that pioneered much of the use of MCMC in state supreme courts. This introduces the concept of the "bell curve" of maps. They plot 24,000 random district configurations and place the enacted NC legislature's map entirely off the far edge of the curve, visually demonstrating the extreme statistical unlikelihood of the result.
Methods
-
The hard mechanics. Older MCMC methods swapped single precincts one at a time on the border of a district, which often led to wildly non-compact shapes. The ReCom method solves this by fusing two adjacent districts together, drawing a random spanning tree through the fused super-district, and cutting it back into two mathematically compact pieces. This is the industry standard for modern modeling.
Technical Reference
-
GerryChain is the open-source Python ecosystem developed specifically for running ReCom ensembles. As a technical practitioner, you must review the documentation to understand the programmatic structure of a Markov Chain run: defining the initial partition (seed map), setting the constraints (population limits, VRA limits), and declaring the updaters (tracking partisan shifts).
Key Concepts
Why must gerrymandering analysis compare an enacted map to the universe of possible maps?
Moon Duchin argues that strict proportionality ("50% vote = 50% seats") is not a valid baseline because physical geography constrains possible configurations. Instead, analysts must generate thousands of random neutral maps following all state rules. The algorithm itself becomes the baseline for fairness, allowing courts to determine whether the enacted map is a statistical outlier.
How did Duke mathematicians use MCMC ensembles to prove gerrymandering in North Carolina?
Herschlag, Mattingly, and colleagues generated 24,000 random district configurations using only North Carolina's non-partisan criteria. The enacted legislature's map fell entirely off the far edge of the bell curve—more hostile to the minority party than 99.9% of neutrally drawn alternatives—constituting mathematical proof that the outcome could not have been accidental.
How does the ReCom method improve upon older MCMC redistricting algorithms?
Older MCMC methods swapped single precincts at district borders, producing non-compact shapes. ReCom fuses two adjacent districts into a super-district, draws a random spanning tree through the combined geography, and cuts it back into two mathematically compact pieces. This produces more realistic, legally defensible map samples and has become the industry standard.
What is GerryChain and how is it used for redistricting litigation?
GerryChain is an open-source Python library for running ReCom ensemble simulations. Practitioners define the initial partition, set population constraints and VRA thresholds, and declare "updaters" tracking partisan metrics across each generated map. A typical litigation run generates 10,000–100,000 samples, producing a statistical distribution against which the enacted map is measured. Its open-source nature ensures reproducibility—a legal requirement for expert testimony.
Goal: Formally specify the MCMC generation limits for your Methodology Portfolio.
You are moving quickly toward court-ready analysis. If you tell a jury you "randomly generated" 100,000 maps, defense counsel will attack the word "random." You must specify the strict legal parameters that locked the algorithm in.
- State Population Constraints: The algorithm must generate districts with equal populations. State the exact allowed deviation (e.g., "The algorithm will restrict generation to maps with less than 0.5% population deviation at the congressional scale").
- State Preservation Rules: Most states require maps to keep counties whole. Define this constraint: "The MCMC process will penalize precinct swaps that split county geometries."
- Define the Iteration Volume: Enter into the log the number of maps you intend to generate to achieve statistical significance (usually 10,000, 50,000, or 100,000 steps).