Recent Research: Quantitative Methods for Policy Research

Interdisciplinary Training in Methodological Innovation

At the 10th annual Cluster-Randomized Trials
(CRT) Summer Training Institute led by IPR 
statistician Larry Hedges, participants learned
how to apply CRT methods to education 
research studies.

IES-Sponsored Research Training
The 10th Summer Research Training Institute on Cluster-Randomized Trials (CRT), sponsored by the Institute of Education Sciences (IES) and its National Center for Education Research (NCER) in the U.S. Department of Education, took place from July 18–28 in Evanston. Organized by IPR statistician Larry Hedges, Board of Trustees Professor of Statistics and Education and Social Policy, and Spyros Konstantopoulos of Michigan State University, the institute seeks to provide researchers from around the country with a rigorous methodological framework and perspective. The sessions encompassed a range of topics in the design and execution process, from relevant statistical software to more conceptual challenges, such as framing results. The institute culminated in a mock proposal process, allowing groups to receive feedback from their fellow participants and institute faculty, thereby improving their readiness to apply for competitive IES grants. IPR faculty emeritus Thomas D. Cook ran another IES/NCER Summer Research Training Institute on Design and Analysis of Quasi-Experiments in Education. Other organizers included former IPR graduate research assistant Vivian Wong, now at the University of Virginia, and former IPR postdoctoral fellows Coady Wing, now at Indiana University, and Peter Steiner at the University of Wisconsin–Madison. The 2016 institute exposed participants to a variety of quasi-experimental designs, which are distinct in their use of methods other than randomization to compare groups. Working closely with workshop leaders and fellow attendees to understand and analyze these designs, participants were able to hone their methodological skills while networking with other education researchers from around the nation.

Facilitation of Research Networks and Best Practices

Matched Administrative Data Network
With support from the National Science Foundation (NSF), IPR Director David Figlio, an education economist, continues to lead a national effort to bring scholars, policymakers, and administrators together to develop “next-generation” datasets that link administrative data, such as welfare and school records, to population data, such as birth certificates and death records. While creating these large-scale datasets requires complex collaboration across levels of government and scholarly disciplines, it also creates opportunities to obtain valuable insights and knowledge, especially when evaluating early childhood investments and interventions. In one IPR working paper, Figlio, finance professor and IPR associate Paola Sapienza, and their colleagues use population-level administrative birth and education records from Florida to study the relationship between long-term oriented attitudes and educational attainment in immigrants (see p. 29). A number of IPR scholars are part of the ongoing data network, including economists Jonathan Guryan and Diane Whitmore Schanzenbach, social demographer Quincy Thomas Stewart, psychobiologist Emma Adam, and biological anthropologists Christopher Kuzawa and Thomas McDade. Figlio is Orrington Lunt Professor of Human Development and Social Policy and of Economics, McDade is Carlos Montezuma Professor of Anthropology, and Sapienza is Donald C. Clark/HSBC Chair in Consumer Finance.

Renewed Support for TESS
Since 2002, the NSF-funded project, Time-Sharing Experiments in the Social Sciences (TESS), has enabled researchers to conduct survey experiments with nationally representative samples, free of cost. In 2016, TESS—which is co-led by IPR political scientist James Druckman with former IPR sociologist Jeremy Freese, now at Stanford University—began a partnership with AmeriSpeak,® a new online data collection platform at the University of Chicago. AmeriSpeak is a U.S. household panel that combines the speed and cost effectiveness of panel surveys with enhanced representativeness of the U.S. population. Druckman expects that the panel will generate high-quality, representative data, allowing scholars to make strong causal inferences that generalize to large populations. The panel will be used to recruit participants for experiments in TESS’ Short Studies Program, which allows researchers to field brief population-based survey experiments on a general population of at least 2,000 adults. To date, TESS has fielded more than 350 studies in a variety of disciplines. Renewed support from the NSF will allow the platform to continue fielding studies at no cost to their researchers. Druckman is Payson S. Wild Professor of Political Science. 

Composition of mTurk Samples
More and more, survey researchers are turning to online convenience samples—nonprobability samples made up of people who respond to online surveys—through Amazon’s Mechanical Turk (MTurk). Yet scholars are debating whether MTurk samples are representative, noting certain social and political differences, such as in age and partisanship between MTurk and population-based samples. In research published in SAGE, Druckman and Freese, along with Northwestern graduate student Kevin Levay, investigate the extent to which MTurk samples differ from population samples. They do so by replicating part of the 2012 American National Election Study (ANES) with an MTurk sample. Though they find that MTurk respondents differ significantly from the 2012 ANES respondents, the researchers determine that most of these differences are easily controllable. Their analysis indicates that if one can control for political and demographic differences, MTurk respondents are not fundamentally different from population-based samples and thus would be useful for samples. The results emphasize the usefulness of MTurk samples for researchers, especially when they can measure and control for political and demographic variables when necessary.

Data Use, Quality, and Cost in Policy Research

Reducing Selection Bias
While quasi-experiments can eliminate selection bias and generate the same causal results as randomized experiments, it is not always clear when the necessary theory-specified assumptions are met. Cook is leading a research program to translate the relevant statistical theory into concrete operations that researchers can use to design quasi-experiments that will usually give similar results to those from experiments characterized by the same treatment and measurement details. One line of work involves meta-analysis of results from 15 studies that deliberately compared experimental and regression discontinuity results at the treatment cutoff. Another involves meta-analysis of results from five studies that compared the results of experimental and comparative regression discontinuity results away from the treatment cutoff. A third study conducts a meta-analysis of correspondence between the results of nine experiments and nine comparative interrupted-time series studies that shared the same treatment. A fourth study involves meta-analysis of 49 studies that deliberately contrasted experimental estimates with those from nonequivalent control group designs. This study explores two things: The individual and joint roles played by local versus distant comparison group choice, matching on pretest measures of the study outcome or not, and collecting a “rich” set of covariates. Second, it examines which type of variables among the “rich” covariates reduces most of the initial bias found in quasi-experiments. Finally, three original studies assess how local comparison group choice, pretest measures of the study outcome, and various combinations of “rich” other covariates cumulatively affect bias reduction.

Methods for Studying Replication
From medicine to psychology, many research fields are facing a “replicability crisis,” in which researchers have not been able to replicate major findings. While scholars agree on the importance of replicability in research, they find little consensus on how to evaluate how well a series of studies replicates another. Across research fields, different papers use different criteria to measure replication without any clear standards of what replicability means. In an ongoing research project, Hedges is working to develop a coherent statistical framework for studying replication. Doing so will allow for a more systematic approach of responding to the so-called replicability crisis, as it persists in the “hard” science fields, like medicine, and works its way into the social sciences, too. Hedges also outlined the replicability crisis while speaking at the annual meeting of the Society for Research on Educational Effectiveness (SREE), which he helped to found. In his address, Hedges emphasized one key to beating back the replicability crisis in education research: teaching sound methodologies to up-and-coming researchers. At the 2016 SREE conference, he also accepted the inaugural, eponymous Hedges Award, endowed by the Spencer Foundation, William T. Grant Foundation, and American Institutes for Research, to honor his lifetime achievement in education research.

Improving Generalizability
With support from the NSF and the Spencer Foundation, Hedges is working to improve the generalizability of education research findings. Specifically, Hedges seeks to identify methods for formalizing subjective concepts of generalizability and external validity, which refer to the extent to which research findings apply beyond the sample and setting being studied. The first part of the project involves theoretical work, looking at ways to quantify generalizability concepts in terms of bias and variances of estimates of average population treatment effects. The project also entails developing methods to improve generalizability from existing experiments, as well as case studies of retrospective generalizability. While improving generalizability is applicable across a variety of research contexts, the findings will have particularly important implications for education research, as Hedges identifies methods for planning education experiments to be generalizable to policy-relevant populations.

Protecting Privacy in State Data
The IES has spent more than $600 million helping states develop longitudinal data systems to better understand and improve the performance of American school systems. Yet concerns about protecting privacy and the Federal Education Rights and Privacy Act (FERPA) have created data-access barriers for researchers. With funding from IES, NSF, and the Spencer Foundation, in addition to the cooperation of a dozen states, Hedges and his research team are investigating methods to make large datasets available while still protecting individuals’ privacy. Their initial objective is to show state education agencies that it is possible to create protected datasets that meet FERPA standards. Using 20 synthetic datasets for fourth and eighth graders in seven states, they show that these data can reproduce statistics such as means, proportions, and standard deviations, while also protecting information at risk for disclosure. As the project moves forward, Hedges and his team will incorporate more complex statistical testing—including capturing longitudinal trends—to show that state data can remain informative while also protecting privacy.

Improvements to Experimental Design and Quality

Charles F. Manski
IPR economist Charles F. Manski
has proposed choosing sample
size in medical experiments to
achieve near-optimal treatment
rules rather than statistical power.

Sample Size of One?
First introduced in the 1920s, hypothesis testing has long been medical researchers’ go-to method for determining sample size in randomized clinical trials. However, a study published in the Proceedings of the National Academy of Sciences by IPR economist Charles F. Manski and Aleksey Tetenov, a former IPR graduate research assistant now at the University of Bristol (U.K.), proposes an alternative method to better inform treatment choice in the medical field. The researchers first highlight a shortfall of hypothesis testing: This method relies on statistical power to determine sample size, yet there is no clear connection between statistical power and effective treatment choice. Accordingly, Manski and Tetenov propose an alternative method, rooted in statistical decision theory, that uses near-optimal treatment rules, rather than statistical power, to determine sample size. In the case when a given treatment’s average effects are unknown—which is the standard case in clinical settings—these near-optimal rules allow researchers to select treatments acceptably close to what they would have selected had they known the effects of each treatment. The proposed method also drastically reduces the sample size needed to run an informative clinical trial. A sample size of 145 under near-optimal rules can be as accurate as a hypothesis test based on a sample of almost 8,000. Though the study focuses on the medical setting, the researchers emphasize that the findings hold true across randomized experiments in other fields. Manski is Board of Trustees Professor in Economics.

Judgments in Medical Risk Assessments
In an IPR working paper, Manski studies a problem that arises when clinicians seek to personalize patient care: How can doctors use informal clinical judgments about their patients, beyond the predictors of evidence-based risk assessments, to make clinical decisions? Taking the example of risk assessment for breast cancer, Manski examines the National Cancer Institute’s Breast Cancer Risk Assessment (BCRA) Tool, which is widely used in clinical practice. The tool generates a probability that a woman will develop an invasive breast cancer based on eight factors, which include a patient’s current age, genetic BRCA mutations, history of cancer, and race/ethnicity, among others. While the BCRA Tool personalizes a woman’s predicted risk of breast cancer in multiple respects, it does not include other personal attributes that a doctor might observe, such as excessive consumption of alcohol, also associated with a higher cancer risk. The doctor could either choose to ignore such observable traits and just follow the BCRA Tool’s recommendations, or she could incorporate her informal clinical judgments into her assessment. Empirical psychological research has shown that clinical judgments are typically less effective than evidence-based ones. Manski, however, offers a middle ground that would allow doctors to combine evidence with judgment to reach well-grounded predictions. His analysis shows how doctors can make coherent and informative personalized risk assessments in situations where they only have partial knowledge of patient risk or illness outcomes using, for instance, maximin and minimax-regret rules that take account of all observed patient attributes.

Enhancing Survey Methodology
IPR statistician Bruce Spencer is participating in an expert panel on methodological enhancements to the National Intimate Partner and Sexual Violence Survey. The ongoing survey was developed by the U.S. Centers for Disease Control and Prevention to collect national and state data on intimate partner violence, sexual violence, and stalking. Started in 2010, the survey has discovered that about 19 percent of women and almost 2 percent of men have been raped, and an estimated 44 percent of women and 23 percent of men have experienced other forms of sexual violence. Future years of the survey aim to track trends in these types of crimes.

More Data or Better Data?
When an agency designs a new household survey, it faces crucial questions about how much time and effort to invest in the quantity and quality of the data it collects. To study the issue, Manski and consultant Jeff Dominitz apply statistical decision theory—specifically, the Wald framework, which considers both sample design and how the resulting data will be used. They use it to examine a simple allocation for a data budget by pairing different random sampling processes by their risk, examining differences in data collection costs and the resulting data quality. For example, when allocating a budget for data collection in a household survey, allocating more to a low-cost process collects more data, while allocating more to a more expensive process collects better-quality data. Manski and Dominitz then investigate two additional cases. In both, they find a more expensive sampling process accurately measures each sample member’s outcome, but using a low-cost process yields important differences: One low-cost process leads to survey nonresponse, while the other provides a low-resolution interval measure of each sample member’s outcome. In both of these cases, the researchers show how using a minimax-regret sample design can predict a real-valued outcome under square loss, which is a design that minimizes maximum mean square error. The design proves particularly informative in cases where data quality is a variable—and improves upon the current practice where survey planners focus on variance, without considering bias. The study is forthcoming in the Review of Economic Studies.

Cost-Benefit Analysis for a Census
A growing number of countries, from Canada to Nigeria, are grappling with whether to carry out a quinquennial census, or a census conducted every five years. Since countries rely on census data for a variety of purposes, such as allocating government funds and resources, it is important to understand the costs and benefits of conducting censuses more frequently. In the Journal of Official Statistics, Spencer and his colleagues outline the uses and limitations of cost-benefit analyses by examining South Africa’s debate over whether to conduct a 2016 census. They show that South Africa faced a choice leading up to 2016: Conduct another census or rely on increasingly inaccurate data on births, deaths, and migration from its 2011 census. The benefits of conducting a census would have included more accurate population estimates to inform allocation of government funds, but it would have been expensive to carry out. Using the performance of past estimates, Spencer and his colleagues estimated the expected reduction in errors in fund allocation if the 2016 census were carried out. They then quantified the expected improvement in allocation using a loss function. Drawing from this analysis, South Africa ultimately decided against conducting a 2016 census; instead, it focused on improving its existing data and capacity to produce more accurate postcensal estimates.

Thomas Brock
Thomas Brock (middle), commissioner of
the National Center for Education Research
(NCER), congratulates IPR statistician
Larry Hedges (left) and Spyros
Konstantopolous of Michigan State
University on completion of the 10th 
Summer Research Training Institute on
Cluster-Randomized Trials, which is 
sponsored by NCER and the Institute
of Education Sciences.

Forecasting the 2020 Census
Conducting a census is a large-scale operation that involves deciding whether and how to use different operational programs. These decisions might include whether to build address lists using in-office technologies or by canvassing in the field, whether to collect data via paper forms or online, and whether to use administrative records and/or third-party data to follow up with people who do not answer—known as nonresponse follow-up (NRFU). In collaboration with the U.S. Census Bureau, Spencer is examining these alternative census operational programs, considering both the output and accuracy parameters that characterize them. For instance, when looking at output, one might consider what fraction of the housing units designated for NRFU can be classified as vacant based on administrative records versus what fraction really requires a follow-up. In terms of accuracy, one might then ask how many of the houses labeled as vacant in the administrative records are actually occupied and thus only mislabeled as vacant. While the exact parameters cannot be known ahead of time for the 2020 census, Spencer and his collaborators are working to forecast parameters at the national and state level. Doing so will allow for the specification of error distributions for the population figures determined by the Census, which are used for important functions like allocating funds and apportioning congressional seats.

Promoting Methodological Innovation
Over the course of 2016, IPR hosted five speakers from a variety of disciplines as part of the Q-Center Colloquia Series. Organized by Hedges and IPR graduate research assistants Jacob Schauer and Wendy Chan, now at the University of Pennsylvania, the series is designed to showcase and promote discussion of methodological innovation. Two speakers looked at issues related to healthcare: Alan Zaslavsky, professor of biostatics at Harvard University, presented two novel multivariate modeling applications to healthcare quality data, and Johns Hopkins biostatistician Thomas Louis discussed the benefits and challenges of individuals self-selecting into epidemiological studies and surveys. In education, Li Cai of the University of California, Los Angeles examined the role of flexible item response theory models in evaluation and assessment models, and James Kim of Harvard University detailed a method for scaling evidence-based literacy interventions. In addition, Uri Simonsohn of the University of Pennsylvania proposed a valid test of U-shaped relationships.