Recent Research: Quantitative Methods for Policy Research

Facilitation of Research Networks and Best Practices

Charles F. Manski
On November 17, IPR economist Charles F. Manski
lectured at the National Academy of Sciences on
uncertainty in policy analysis and scientific research.

Gun Laws and Crime Rates
How do right-to-carry laws in the United States affect crime rates? Though gun laws have become a source of heated public debate in the wake of mass shootings, little is known about whether such laws deter crime or lead to more of it. Looking at right-to-carry (RTC) gun laws, which allow individuals to carry concealed handguns, IPR economist Charles F. Manski and John Pepper of the University of Virginia find no academic consensus on their effects: Despite dozens of studies using the same datasets, researchers have arrived at very different conclusions. In the Review of Economics and Statistics, Manski and Pepper highlight the role of varying assumptions used in such analyses and explain the importance of discussing how these assumptions affect statistical results. Manski and Pepper conducted their own original analysis of how RTC laws affected crime in Virginia, Maryland, and Illinois, finding the effects vary. Under some assumptions, RTC laws appear to have no effect on crime rates. Under others, RTC laws seem to increase rates for certain crimes, decrease them for some crimes, and have varying effects for others. While the results provide no easy answer, they highlight why researchers using the same data can arrive at such vastly different results and how different assumptions shape findings. Manski is the Board of Trustees Professor in Economics.

Measuring Probabilistic Expectations
Prior to the 1990s, economists largely shied away from trying to collect data on individuals’ expectations. Instead they either tried to infer expectations from choice behavior or assumed that people hold particular expectations. More recently, economists have set out to measure individuals’ probabilistic expectations for uncertain events, adding to a growing body of evidence on the expectations held by large populations. In an IPR working paper, Manski outlines the history of how economists came to measure probabilistic expectations, moving from reliance on assumptions to actual data measurement. He turns to applications of empirical expectations data for three types of macroeconomic events: expectations for stock (equity) returns, expectations for inflation, and professional forecasts by macroeconomists. In original research that captured Americans’ expectations for their equity returns in the coming year in probabilistic terms, Manski found that empirical data on stock performance expectations differed starkly from traditional surveys that classified respondents as feeling more bullish or bearish about the stock market. Going one step further, Manski discusses how equity expectations data might provide insight into portfolio decisions. The research highlights how measuring, rather than assuming, expectations can better inform studies of expectations formation in macroeconomic policy analysis. While he applauds economists’ willingness to study the topic, Manski also outlines his concerns over a lack of empirical research and the potential proliferation of models in that void.

Three Education Research Challenges
Education research has gained scientific rigor, momentum, and respectability since 2000, according to IPR education researcher and statistician Larry Hedges. In the Journal of Research on Educational Effectiveness, he details the impact of the U.S. Department of Education’s Institute of Education Sciences (IES), created in 2002, on the establishment of rigorous experimental methods to study education, as well as the development of well-trained researchers and the availability of adequate funding. Hedges warns, however, that the methods and practice of education research must now meet three major challenges. The first, shared with other branches of scientific research, is the replication crisis, which he expects to affect education research soon. The second challenge is to better support the generalizability of education research. Hedges cautions researchers against assuming their results are generalizable and notes that while employing random, or probability, sampling is the “gold standard” to insure generalizable results, it is often not possible. The final challenge is to match research designs to the complexity of the interventions being studied. He offers some examples of designs for multiple possible treatment components that are applied to some subjects and not others, or to some subjects as needed during the experiment. Hedges concludes that education
researchers need to improve their methods and respond to the challenges he outlines, and he recommends open-mindedness and humility—rather than defensiveness and condescension—when communicating research findings to the public and policymakers. Hedges is the Board of Trustees Professor of Statistics and Social Policy and of Psychology.

Examining the 'Replicability Crisis'
From medicine to economics, many research fields are facing a “replicability crisis” today, in which researchers have not been able to replicate major findings from existing studies. While scholars agree on the importance of replicability in research, they find little consensus on how to evaluate how well a series of studies replicates another. Across research fields, different papers define replication differently and use different criteria to measure replication without any clear standards of what replicability means. Researchers find little consistency when it comes to designing replication studies. In an ongoing research project, Hedges is working to develop a coherent statistical framework for studying replication. Doing so will allow for a more systematic approach of responding to the so-called replicability crisis, from more consistently defining what it means to be “replicable” to trying to generate greater consistency in the design of replication studies and their statistical analyses. Hedges has several journal articles set to be published on the topic. This project is supported by IES.

Replicability in Clinical Psychology
Concern about replicability in experimental psychology is growing, yet not all areas of psychology have fully participated in the conversation. In Perspectives on Psychological Science, psychologist and IPR associate Jennifer Tackett and her colleagues explore why clinical psychology and related fields, such as counseling, psychiatry, epidemiology, social work, and school psychology, have remained largely outside of the discussion of replicability. The authors point out that in contrast to other areas, such as social psychology, clinicians are used to relying on correlational, rather than experimental, effects. Because they often work with small numbers of subjects who cannot be easily recruited, they expect noisy and imperfect data. Tackett and her co-authors suggest methods for accommodating the lessons learned about replicability and questionable research practices in other domains of psychological research to the specific conditions of clinical psychology. They review and ultimately recommend measures for clinical science researchers, such as reducing questionable research practices, using preregistration and open data approaches, and increasing statistical power. They also note that the replicability movement can also learn from clinical science, offering seven recommendations that include accommodating sensitive data and thinking “meta-analytically.”

Data Use, Quality, and Cost in Policy Research

Beth Tipton
Elizabeth Tipton of Columbia University,
a former IPR graduate research assistant,
lectured on how to design studies with
generalizability on April 26.

Economics of Scaling Up
When researchers conduct randomized controlled trials (RCTs) of social programs, the hope is that smaller-scaled programs that appear promising in initial RCTs can then be implemented at a larger scale with a larger set of resources. But how can it be known whether a social program will be equally successful at scale without having been tested at scale? IPR economist Jonathan Guryan proposes a way to measure the economics of scaling up in an IPR working paper, along with Jonathan Davis, Kelly Hallberg, and Jens Ludwig of the University of Chicago. Their model focuses on one scale-up challenge specifically: For most social programs, labor is a key input, yet as social programs scale up and seek new hires, a search problem can arise. This results in a situation where, as programs are scaled, either the average cost of labor must go up or program quality will decline at fixed costs. For instance, a tutoring program that is being scaled up will eventually face a labor supply problem where the program will either need to start offering higher pay to attract high-quality tutors, or will have to accept lower-quality tutors at lower pay. While acknowledging that exact costs of scale-up cannot necessarily be known, Guryan and his co-authors show that it is possible to create and test a program at a smaller scale while still learning about the input supply curves, such as the labor supply curve, facing the firm at a much larger scale. This can be done by randomly sampling the inputs the provider would have hired if they operated at a larger scale. Highlighting the specific case of a Chicago tutoring program they are evaluating and trying to scale up, the researchers show how scale-up experiments can provide insights and confidence that cannot be derived from simply scaling up without an experiment at a modest scale. 

Protecting Privacy of Student Data
Over the last decade, the U.S. Department of Education has invested more than $600 million to help states improve their educational longitudinal data systems (SLDSs), including student background data and assessment data. SLDSs today include longitudinal data on millions of students, which should make them a rich source of data for education researchers. However, the Family Education Rights and Privacy Act (FERPA) has placed restrictions on accessing this data, making it difficult for independent researchers to use it. A research project led by Hedges, and supported by the NSF and IES, continues to examine the balance between protecting privacy and allowing for effective education research. A previous study led by Hedges, which used SLDS data from 11 states and 5 million students, found substantially different results for analyses using masked data, which scrambles or hides some original data to protect sensitive information, versus unmasked data. Subsequent analyses have shown that the data masking procedures used in several states have led to the deletion of a large, non-random portion of the SLDS data, which could have implications for research findings based on these datasets. Hedges’ current project evaluates several approaches to statistical disclosure control that could both make SLDS available to researchers and comply with standards set out by FERPA. He and his collaborators argue that these approaches must meet three criteria: 1) they must assure confidentiality; 2) they must preserve information about relationships in the data; and 3) the protected data must be open to conventional analyses by social science researchers. As a next step, the researchers will evaluate how well two competing general approaches meet the above criteria.

Improvements to Experimental Design and Quality

Bruce Spencer and Larry Hedges
Following a presentation at a Fay Lomax Cook
Monday Colloquium, IPR statistician Bruce
Spencer (right) follows up on a point about his
work on the South African census with IPR
education researcher and statistician Larry Hedges.

Census Design, Costs, Accuracy
As the 2020 U.S. Census approaches, bureau officials must finalize census design, which means determining what operational programs will be used to collect census data. These decisions might include whether to build address lists using in-office technologies or by canvassing in the field, whether to collect data via paper forms or online, and whether to use administrative records and/or third-party data to follow up with people who do not answer—known as non-response follow-up. In making these decisions, the Census Bureau must consider the outputs and accuracy of different types of operational programs. For example, in terms of output, how many housing units designated for non-response follow-up can be classified as vacant based on administrative records, without any need for in-person follow-up? Additionally, in terms of accuracy, what fraction of these housing units will actually be occupied and therefore mistakenly classified as vacant? Since the exact accuracy of each program cannot be known ahead of time, it must instead be forecasted. In a project supported by the National Science Foundation (NSF) and the U.S. Census Bureau, IPR statistician Bruce Spencer is collaborating with Census Bureau researchers to forecast the accuracy parameters of different census operational programs at both the national and state levels. This will ultimately help specify error distributions for the state population counts. Additionally, in an IPR working paper with former IPR graduate research assistant Zachary Seeskin, now at NORC, the two researchers contrast the costs of attaining accuracy with the consequences of imperfect accuracy for census data. They detail how inaccuracy rates in the 2020 Census have the potential to cause quite large distortions. For instance, an average error of 2 percent for state populations could result in expected federal funding shifts of more than $50 billion over 10 years and expected shifts in the apportionment of as many as seven House seats.

Methods for Subnational Analyses
How do national institutions, social differences, and other important social and political variables both shape and respond to the dynamics at the local level? To answer this complex question, IPR political scientist Rachel Beatty Riedl proposes four different types of research designs to illuminate subnational variation in relation to national-level structures within and between countries. Pointing to examples of each design type, she looks specifically at the national-local interaction between political party competitiveness and local differences in ethnicity, religion, and socioeconomic status (SES) to assess how different religious groups are politically engaged in Kenya, Uganda, and Tanzania. She delineates a two-level interaction approach that focuses on the relationship between national and local institutions: a hierarchical model that examines how influences between levels, from the most local to the national, work both from the bottom to the top and the top to the bottom. Riedl notes that cross-border comparisons between countries can be used as quasi-experiments. She finds that by using all of these methods, she is able to uncover the intricate relations and influences that lead to differences in how various religious groups in these countries respond to political conditions. The research appears in American Behavioral Scientist.

Single-Case Design Effect Sizes
The identification of effective educational interventions is central to educational policy. Randomized experiments are the first design considered in studying effectiveness, but they are not always feasible or ethical. Researchers then turn to quasi-experimental methods to estimate effects. In research supported by IES, Hedges focuses on single-case design (SCD), a type of quasi-experiment often used in the fields of autism, learning disorders, school psychology, special education, developmental disorders, and remedial education, as well as in speech, language, and hearing research. He is developing standardized effect-size metrics for SCDs that will allow researchers to use the same measurement scale as is used in other experimental designs—allowing easy comparison between effects in different types of experiments. This project extends earlier work done with the late William Shadish of the University of California, Merced, on SCD effect size estimators.

The Promise of Administrative Data
In recent years, data collection, storage, and analysis capabilities have grown exponentially. Taking advantage of these improvements, governments across the world have developed large-scale comprehensive datasets ranging from health and education data to tax programs and workforce information. Governments have established statistics offices dedicated to maintaining and using data to produce official statistics about their populations and about specific programs. This has been especially true in the field of education, where, for instance, the U.S. Department of Education has invested over $750 million to help states build, populate, and maintain longitudinal education datasets. In Education Finance and Policy, IPR education economist David Figlio, IPR research associate Krzysztof Karbownik, and Kjell Salvanes of the Norwegian School of Economics explain how these data, though collected for administrative purposes, present exciting new research opportunities. The researchers outline the challenges associated with the use of administrative data in education research, as well as its immense promise. Existing challenges include the fact that administrative data are collected for administrative purposes rather than for research, meaning that the variables captured in administrative data are not always conducive to answering specific research questions. They also acknowledge technical barriers to accessing and analyzing administrative datasets. The benefits of administrative data in education include the reduced need for expensive data collection via surveys as well as the ability to analyze population data. Looking ahead, the authors acknowledge practical and technical issues that limit the use of administrative data today, and call on their fellow researchers to highlight the potential for mutually beneficial collaboration between researchers, practitioners, and policymakers. Figlio is the Orrington Lunt Professor of Education and Social Policy and of Economics.

Interdisciplinary Training in Methodological Innovation

IES-Sponsored Research Training
IPR faculty emeritus Thomas D. Cook, a social psychologist, led the 2017 Summer Research Training Institute on Quasi-Experimental Designs, sponsored by IES’ National Center for Education Research. Held from July 31–August 11 in Evanston, the workshop presented a variety of quasi-experimental designs, which employ methods other than randomization to compare groups. The 2017 session placed special emphasis on analysis of data gleaned from the quasi-experimental designs and employed both lectures and hands-on training. Participants were able to improve their methodological skills while working with education researchers from around the country. Other workshop faculty included former IPR graduate research assistant Vivian Wong, now at the University of Virginia, and former IPR postdoctoral fellows Coady Wing, now at Indiana University, Peter Steiner of the University of Wisconsin–Madison, and Stephen West of Arizona State University.

Promoting Methodological Innovation
During the year, IPR hosted four speakers from a variety of disciplines as part of the Q-Center Colloquia Series. Organized by Hedges and IPR graduate research assistant Jacob Schauer, the series showcases and promotes discussion of methodological innovation. Speakers included Victoria Stodden of the University of Illinois at Urbana-Champaign, who presented a new framework for statistical analysis called CompareML. Daniel Almirall of the University of Michigan discussed adapting sequential multiple assignment randomized trials, or SMART, a type of multi-stage randomized trial design, to stable clusters of individuals. Another presentation, by Linda Collins of Pennsylvania State University, examined an alternative to the randomized controlled trials so frequently used in the development and evaluation of health and educational interventions; the framework, known as multiphase optimization strategy, or MOST, includes additional steps before the RCT. Rebecca Maynard of the University of Pennsylvania examined model cases of evidence use—and misuse—in policy development studies and suggested guidance in sorting, sifting, and implementing evidence in meaningful ways.