The MAGIC Criteria

Draft created: 16 February 2015
Last updated: 16 February 2015
Status: First draft

For more than 40 years, Robert Abelson taught statistics to Yale psychology students. Toward the end of his career, he published Statistics as Principled Argument (1995), a book-shaped attempt to reframe the deeply misunderstood field of statistics. P-values, z-scores, and t-tests aren’t mechanistic tools for quantifying the world, Abelson writes; they’re tools for making sense of the world. And to make sense of the world, we need to make compelling (but principled) arguments about the information we observe.

Abelson packs his book with real-life anecdotes and far more nuance than what you’ll find below. I won’t try to summarize it. Instead, I’d like to share one of the book’s most interesting and broadly-applicable concepts: the “MAGIC” criteria.

If the goal of statistics is to make compelling arguments about the world, how do we know if we’ve succeeded? I.e., What makes an argument compelling? Abelson presents an elegant (if cheesily-named) framework to guide us.

“There are several properties of data, and its analysis and presentation, that govern its persuasive force,” he writes. “We label these by the acronym MAGIC, which stands for magnitude, articulation, generality, interestingness, and credibility.” In slightly more detail:


“The strength of a statistical argument is enhanced in accord with the quantitative magnitude of support for its qualitative claim.” Oversimplified: Bigger is better. Magnitude, of course, is relative. The difference between two outcomes in an experiment might seem large on its own (“effect size”) or because the intervention seemed small (“cause size”).


“By articulation, we refer to the degree of comprehensible detail in which conclusions are phrased.” Under another acronym, Abelson might have called this precision or detail. An argument that says, “Trains A and B run at different speeds,” is less compelling than one that says, “Train A runs faster than Train B between Chicago and New York, but slower between New York and Boston.”


“Generality denotes the breadth of applicability of the conclusions.” One way to accrue generality: Take multiple approaches to answering the same question. Another: Apply the same approach in different contexts.


“For a statistical story to be theoretically interesting, it must have the potential, through empirical analysis, to change what people believe about an important issue.” Change what people believe. About an important issue.


“Credibility refers to the believability of a research claim. It requires both methodological soundness, and theoretical coherence.” The burden of proof, at least at the outset, is on the investigator.

The MAGIC criteria aren’t revolutionary, or even very surprising. But they synthesize — efficiently, memorably, and non-tyrannically — a potent handful of ideas.

In recent months, I’ve applied the MAGIC criteria to my own work. I’ve found that the framework helps me to interrogate my analyses more systematically — Could this be more generalizable? More precise? — and to find ways to improve them. The mnemonic also nudges me to heed Abelson’s broader theme, that data analysis is a tool for understanding the world and not an end unto itself. Perhaps you’ll find it useful, too.