In Defense of “Noisy Data”: Deirdre McCloskey’s Campaign against Significance Testing
Deirdre McCloskey’s faith in pragmatism and empirical research also fuels Size Matters, her unpublished work that attacks a small but essential rivet that holds together much of accepted modern economic research — significance testing. The idea comes from the field of statistics and holds, roughly, that an effect is significant if there is less than a 5 percent likelihood that it occurred by chance. The book, which Professor McCloskey coauthored with her former student Stephen T. Ziliak, now a professor at Roosevelt University in Chicago, explores how economists have come to mistake statistical significance for economic significance, and how that confusion has led to vast amounts of economic research that is irrelevant or can lead to poor economic policy. They argue that instead of relying on the on–off switch of statistical significance, economists should calculate the costs of being wrong, and use those criteria to help set significance levels. Statistical significance testing, say Professors McCloskey and Ziliak, is too mechanical to be used as a decision-making tool; the considered judgment of scientists, which is messier and less formulaic, would ultimately yield better results.
For example, in the early 1980s, the state of Illinois developed a marginal-wage subsidy program designed to reduce the time that workers would need to collect unemployment insurance. The researchers who tested the pilot program determined that the average benefit-to-cost ratio was 4.3 to 1, meaning that Illinois reduced its spending on unemployment insurance by $4.30 for every $1 spent on the subsidy. However, the researchers determined that the finding was not statistically significant because there was too much “noise” around the data: Instead of the 5 percent significance level, the data showed a 12 percent possibility that the reduction in unemployment insurance might have been caused by something other than the marginal-wage subsidy.
“These nitwits said: ‘No. Not worth it! It’s insignificant.’ It’s a perfectly clear example of this confusion between fit and substance,” says Professor McCloskey. “If you’re the governor of the state, you’re a sophisticated person, you don’t ask for guarantees. What you want to know is whether it’s worth doing this program.” It obviously is, says Professor McCloskey, who adds that any reasonable interpretation of the data would conclude that the possibility of a four-to-one benefit-to-cost ratio was too valuable to discard because of a 12 percent risk of being wrong. The correct course of action, contend Professors McCloskey and Ziliak, would have been to continue the wage subsidy while conducting empirical research to determine the other factors that might have influenced a drop in unemployment spending.
The misuse of significance testing extends into other realms, like medicine, with sometimes devastating costs. Epidemiologists blame the test for the abandonment of such treatments as flutamide, which has shown promise for patients with advanced prostate cancer. A controversial National Institutes of Health study on women’s health concluded that eating a low-fat diet does not reduce cancer risk even though subjects who avoided fat clearly showed a 9 percent lower risk of contracting breast cancer; critics of the study point out that the finding was dismissed because the results just missed the threshold of statistical significance — not because the low-fat diet was ineffective. “If more women had been studied or the study had gone on just a little longer, the data very likely would have been statistically meaningful and announced as such,” one biostatistician said in a February 2006 article in the Wall Street Journal.
Reprint No. 06211
Andrea Gabor ([email protected]) is the author of several books, including The Capitalist Philosophers: The Geniuses of Modern Business — Their Lives, Times, and Ideas (Three Rivers Press, 2002). She is a professor of business journalism at Baruch College/CUNY.