Last month, a National Bureau of Economic Research working paper made headlines across the internet when it claimed to demonstrate that so-called “Right to Carry” (RTC) laws increased violent and property crime rates above where they would have been without the passage of such laws. Now, most science reporting is done by people with zero technical background in the advanced statistical techniques used by the paper’s authors, so I was a bit skeptical it actually said what they were claiming it said. Fortunately, I DO have such a technical background, and for several years now I’ve been following with great interest the academic arguments about the effects of legal guns on crime rates. And after having read the paper in question (Right-to-Carry Laws and Violent Crime: A Comprehensive Assessment Using Panel Data and a State-Level Synthetic Controls Analysis. Donohue, Aneja, and Weber. 2017), I’ve come to the conclusion that I was both right and wrong. Wrong in that the paper’s authors drew the conclusion stated by the journalists—they do, in fact, claim their data shows RTC laws increase crime. But right in that the data doesn’t actually show that when you read it with a more critical eye. Therefore, I’m going to take this opportunity to teach a lesson in why you shouldn’t trust paper abstracts or jump to the “conclusions” section, but should instead examine the data and analysis yourself.
Disclaimer: I am a firearms enthusiast and active in the firearms community at large. However, I am also a scientist, and absolutely made my very best efforts to set that bias aside in reading this paper, and give it the benefit of the doubt. Whether I succeeded or not is up to you to decide, but I believe my objections to the authors’ conclusions are based solely on methodological grounds and will stand up to the scrutiny of any objective observer. Unfortunately, I cannot say the same about Professor Donohue and his co-authors, as their own personal bias against guns is quite evident from their concluding paragraphs. Because of that bias, I firmly believe this paper is a perfect example of “Lies, Damn Lies, and Statistics.”
The paper itself is really divided into two sections: a standard multiple regression analysis and then a newer counterfactual method called “synthetic control analysis.” The authors claim both analyses show that RTC laws increase crime. I disagree, at least with the extent they believe this to be true. Let’s look at each in turn.
First, the regression analysis. The meat of this analysis is comparing four different models (and three variations of those models) for a total of seven specifications. Multiple regression analysis is a powerful tool to analyze observational data and attempt to control for several variables to see what impact each had on the target dependent variable. In this paper, Donohue et al. build their own model specification (DAW), as well as comparing it to three pre-existing models from other researchers (BC, LM, MM). They looked at the effects of states’ passage of RTC laws on three dependent variables: murder rates, violent crime rates, and property crime rates. The key point of their research is that it goes beyond previous papers in its data set: where previous research has stopped at the year 2000, this paper looks at how the results change when the models are fed an additional 14 years of data, looking from 1977-2014.
The problem here is that the authors claim their panel data analysis consistently shows a statistically significant increase in violent crime when using the longer time horizon ending in 2014. This is a problem because, quite bluntly, no, it does not. The DAW variable specification (their new, original model built for this analysis) DOES find an increase in violent crime and property crime rates (though not murder, which they acknowledge). But the spline model of the same variables finds no statistically significant correlation whatsoever. They even acknowledge this in their paper: “RTC laws on average increased violent crime by 9.5 percent and property crime by 6.8 percent in the years following adoption according to the dummy model, but again showed no statistically significant effect in the spline model.” (DAW 8). But then they never mention it again or seek to address why the spline model—an alternative method that’s often preferred over polynomial interpolation for technical reasons—achieves such different results. This spline model was built from the National Research Council report in 2004, and they used it earlier (sans other regressors) to show that the NRC’s conclusions it tentatively showed a decrease in crime rates associated with RTC laws disappear when the data set is extended to 2014. But when they re-run it with their own variables, the lack of statistical significance is mentioned in a single line and then never brought up again.
In fact, the spline model is used comparatively for all four regression specifications, and the only cases in which it finds ANY statistical significance are the two the authors themselves discredit as methodologically unsound (LM and MM in their original versions). But this point is never addressed—the polynomial “Dummy Variable Model” specification and the spline models all dramatically disagree, no matter WHAT set of variables they choose. This, to me, strongly suggests that any conclusions drawn from the panel data regression analysis is highly suspect and the choice of specification deserves further review before they can be believed one way or the other. Regression analysis is always extremely sensitive to specification, and results can shift dramatically based on what variables are included, what are omitted, and how they’re specified. Unfortunately, the paper does not seem to discuss any testing for functional form misspecification (such as a Ramsey RESET test), so it is unclear if the authors compared their chosen model specification to other potential functional forms. There’s no discussion, for example, of whether the polynomial or spline models are better and why. This is a huge gap in the analysis that I would like to see addressed before I’m willing to accept any conclusions therefrom.*
Additionally, panel data suffers from some of the same limitations as cross-sectional data, including a need for large data sets to be credible. In this case, the analysis only looked at 33 states (those that passed RTC laws between 1977 and 2004), making any conclusions drawn from the limited N=33 data set tentative at best. This is not necessarily the authors’ fault—much data is only available at the state level, so it’s much harder to do a broader assessment with more data points (e.g., by county). But it certainly does increase the grain of salt with which the analysis should be taken. Despite that, the authors seem quite willing to draw sweeping conclusions when they should, by rights, be a lot more cautious about conclusive claims.**
The second part of the paper is even more problematic. In short, they build a counterfactual model of each state that passed an RTC law in the specified time period, and then compare the predicted crime rates in those simulated states versus the observed crime rates in their real world counterparts. This is certainly an interesting statistical technique, and is mathematically ingenious. It might even be a useful tool for certain applications. Unfortunately, counterfactual analysis, no matter how refined, suffers a fundamental flaw: by its very nature, it assumes the effects of a single event can be assessed in isolation. In reality, as I’ve discussed before, human social systems are complex systems. One major legal change will have dramatic effects across the board—that policy in turn drives many decisions down the line, so plucking out the one policy of interest and assuming all post-counterfactual decisions will remain the same is blatantly ridiculous. It’s the statistical equivalent of saying “If only Pickett’s Charge had succeeded, the South would have won the Civil War.” Well, no, because everything that happened AFTER Pickett’s Charge would have been completely different, so we can only make the vaguest guesses about what MAY have happened.
But that’s precisely what the authors are attempting to do here, and put the stamp of mathematical certainty on it to boot. They built models of each RTC state in the target period by comparing several key crime-rate-related variables to control states without RTC laws, and then assessed the predicted crime rate in that model against the actual reported crime rates in reality to make a causal claim about the RTC laws’ effects on those crime rates. They decided their models were good fits by comparing how well they tracked the fluctuations in crime rates in the years prior to the RTC (the counterfactual point), and if they were similar enough, they claim it’s a good predictive model. But that fails to account for the cascading changes that would have occurred AFTER the counterfactual point by the nature of a complex system. The entire analysis rests on an incredibly flawed assumption, and thus NO conclusive answers can be derived from it. At best, it raises an interesting question.
The paper isn’t worthless, by any means. The panel data analysis does a good job showing that NO specification, including John Lott’s original model from which he built his flawed “More Guns, Less Crime” thesis, supports a claim that RTC laws decrease crime rates. But that’s about all it does. It hints at the possibility RTC laws may increase violent and property crime rates (though not murder). It certainly doesn’t conclusively demonstrate that claim, but it raises enough doubt that others researchers should tackle it in much more depth. Similarly, the counterfactual “synthetic controls” analysis by no means proves a causal relationship between RTC laws and crime rates for the reasons explained above, but it raises an interesting question that should be examined further.
No, the problem is that the authors pay only lip service to the limitations of their analysis and instead make sweeping claims their data does not necessarily support: “The fact that two different types of statistical data—panel data regression and synthetic controls—with varying strengths and shortcomings and with different model specifications both yield consistent and strongly statistically significant evidence that RTC laws increase violent crime constitutes persuasive evidence that any beneficial effects from gun carrying are likely substantially outweighed by the increases in violent crime that these laws stimulate.” (DAW, 39). The problem is that the panel data regression is unclear given the discrepancies between the Dummy Variable and Spline Models, and less than solid given the low N value for cross-sectional comparisons; and that the synthetic controls rests on a flawed assumption about the nature of the social systems being modeled.
These limitations, combined with the many other papers looking at other types of regressions (such as the impacts of gun ownership in general on violent crime rates) that have been unable to find statistically significant correlations between legal gun prevalence and violent crime rates, make me extremely skeptical of this paper. To be fair, it has yet to undergo peer review (it’s a working paper, after all), and it’s certainly possible many of my objections will be rectified in the final published version. But right now, the best I can say for the data is that it raises some questions worth answering. And it certainly doesn’t support the authors’ claim that their analysis is persuasive evidence of anything. At least, not nearly as persuasive as they’d have you believe.
That’s why I said, at the beginning, never trust an abstract or a conclusion section: read the analysis for yourself, and only then see what the authors have to say about it. Because there’s a great deal of truth to the old saying, “There are three kinds of lies: lies, damned lies, and statistics.” Statistics are a powerful tool. But even with the best intentions they’re easily manipulated, and even more easily misunderstood.
*For those of you who don’t speak “stats geek,” what this paragraph means is that essentially the authors compared two different types of models, which had dramatically different conclusions, and they kinda ignored that fact entirely and moved past it. And then didn’t discuss anywhere in the paper itself or any of the appendices why they chose one over the other, or why they specified any of their models the way they did versus other options. It isn’t damning, but it’s certainly suspiciously like a Jedi handwave: “This IS what our data says, trust us.”
**Again, for the non-statisticians, larger data sets tend to produce more reliable estimates–the larger your data set, the more likely it is that your model’s estimates approach reality. Small data sets are inherently less reliable, and 33 observations per year in the panel data is a tiny data set.
The original paper is available here for anyone who cares to examine it for themselves: http://www.nber.org/papers/w23510