Quantitative Methods for
Policy Research
Most researchers and academics tend to stick with the research methods they know best, learned mainly in graduate school—even though those methods might not represent current best practices or the most appropriate method. This is why statistician and education researcher Larry Hedges, with the support of a group of distinguished interdisciplinary scholars, launched the Center for Improving Methods for Quantitative Policy Research, or Q-Center, at IPR. The work of Q-Center faculty often overlaps with IPR’s Education Policy program.
| • |
improving designs, analysis, and synthesis in policy research |
| • |
designing more robust methods to evaluate education interventions |
| • |
developing new data sources and methods of data collection |
• |
sharing best research practices across disciplines and professional networks |
| • |
addressing common statistical problems such as selecting covariates, drawing causal inferences, and analyzing missing data |
 Overview
of Activities
IPR-Led Methodological Workshops
 |
Thomas Cook (second-row center) and William Shadish (front-row center)
pose with participants from the first of two workshops at IPR
in summer 2010 on quasi-experiments in education research. |
The Institute of Education Sciences (IES)—the research wing of the U.S. Department of Education—supported two workshop series at IPR in summer 2010 to improve methodological practices in education research, each co-organized by an IPR fellow.
The Workshop on Quasi-Experimental Design and Analysis in Education trained 120 researchers over two weeklong sessions between August 2 and 13. This series seeks to improve the quality of quasi-experiments, which are needed when random assignment is not feasible or breaks down in education settings. It is led by IPR social psychologist Thomas D. Cook and William Shadish of the University of California, Merced. Participants learn three distinct quasi-experimental techniques—regression discontinuity, interrupted time series, and nonequivalent group design using propensity scores and matching—and the advantages and disadvantages of each. In all, more than 450 researchers have received training since the program first got underway in 2006 with funding from the Spencer Foundation. Another series is scheduled for August 2011 with continued support from IES.
Thanks to continued support from IES’ National Center for Education Research, IPR education researcher and statistician Larry Hedges and professors Mark Lipsey and David Cordray of Vanderbilt University led their third summer institute on randomized controlled trials (RCTs) in education, July 25 to August 6, at Northwestern. Thirty researchers from around the country participated in the two-week training, which focuses on the use of cluster-randomization—a methodological tool that helps account for the group effects of teachers and classrooms when measuring an intervention’s effects on individual student achievement. Sessions cover a range of specific topics in the design, implementation, and analysis of education RCTs, from conceptual and operational models to sampling size and statistical power. Participants also learn to use software such as STATA and HLM to conduct hierarchical data modeling and work in groups to create mock funding applications for an education experiment. The next session will take place at Northwestern in June 2011.
IES Director John Easton visited Northwestern on August 4 and spoke to workshop participants about the importance of engaging school practitioners and answering questions that are most relevant for improving schools.
Multilevel Methods in Education
In a project supported by IES, Hedges is developing improved statistical methods for analyzing and reporting the results of multilevel experiments in education. Many education evaluations employ complex, multilevel designs to account for the effects of clustering—or the fact that students are situated within certain classrooms in certain schools.
In a 2010 paper published by IES’ National Center for Special Education Research, Hedges, who is Board of Trustees Professor of Statistics and Social Policy, and IPR postdoctoral fellow Christopher Rhoads take a closer look at statistical power analysis in such studies. Incorporating the effects of clustering, they show that statistical power depends on a number of factors, such as the values of intraclass correlations and sample sizes at the various levels. Fortunately, statistical power can still be computed from standard power tables using operational effect sizes, or effect sizes multiplied by a design effect that depends on features of the complex experimental design. Hedges and Rhoads apply these concepts to the various research designs most frequently used in education field studies.
Mixed Methods in Randomized Field Studies
Most randomized controlled trials (RCTs) focus on gathering quantitative data, with any qualitative analysis playing a secondary role. However, education researcher and IPR associate James Spillane demonstrates that qualitative approaches need not take a backseat to quantitative analysis, even in randomized field studies. As a case study, he uses his own research on a principal training intervention in a midsized urban U.S. school district.
From the outset, Spillane emphasizes, he and his colleagues mixed quantitative and qualitative approaches in various ways and to various ends. For example, the results of one type of analysis were often used to cross-validate results of another type or to provide context to puzzling or unexpected findings. Moreover, most RCTs evaluate treatment effects on average across large and diverse populations, sometimes overlooking the variation within. Qualitative data can help researchers dig beneath the aggregate-level results and describe not only whether a program does or does not work, but under what conditions it is most or least effective. Given the great cost and emerging press for RCTs, Spillane’s work shows the value of a mixed-methods approach in collecting and analyzing data on education interventions. He is Spencer T. and Ann W. Olin Professsor in Learning and Organizational Change.
Alternatives to Randomization
Many researchers believe that randomized experimentation is usually the best methodology for investigating issues in education. However, it is not always feasible. The usually advocated alternative—quasi-experimentation—has come under attack from scholars who contrast the results from a randomized experiment and a quasi-experiment on the same topic, where the quasi-experiment shares the same intervention as the experiment. Thus, the quasi-experiment and the experiment are supposed to vary only in whether the control group is randomly formed.
Cook is critically examining this literature, comprising more than 20 studies, and is adding new studies to it with his colleagues. Using this database, he is comparing the results of quasi-experiments with and without specific design features to assess how well each reproduces the results from a randomized experiment sharing the same treatment group. The point is to evaluate the advantages and limitations of different kinds of quasi-experimental designs and of different ways of analyzing the data from these designs. The project receives support from IES and is part of a larger project examining methods for improving quasi-experimental designs in education.
In recent work with several IPR colleagues and former IPR postdoctoral fellow Peter Steiner, now at the University of Wisconsin–Madison, Cook finds several key ways to reduce bias in quasi-experimental studies, including prior knowledge of close approximations to the unknowable true selection process; the use of pretest measures of the outcome taken at several different time points; the use of matching at school, city, or district levels before matching at the individual child level; and the use of reliably assessed covariates from the multiple different pretest domains, especially when the pretest measures of the outcome are not available. Results also show that the mode of data analysis is much less important than which covariates are chosen and how well they are measured. Cook is Joan and Sarepta Harrison Chair in Ethics and Justice.
Multivariate Regression-Discontinuity Design
IPR postdoctoral fellow Vivian Wong is lead author on a 2010 IPR working paper with Cook and Steiner that introduces the multivariate regression-discontinuity design (MRDD) for use in quasi-experiments. This design adds multiple assignment variables and cutoffs to traditional regression-discontinuity designs. The researchers analyze an MRDD with two assignment variables to show that the effects can be decomposed into a weighted average of two univariate, regression-discontinuity effects. Using simulation data, they also explore the contexts and conditions under which four methods—the frontier, centering, univariate, and instrumental variable approaches—can produce unbiased estimates of treatment effects.
Covariate Selection and Reliability
When conducting an observational study, it is often difficult to choose those covariates that are highly correlated with the real (unknown) selection process and the potential outcomes of selection bias. In a recent article in Psychological Methods, Cook and his colleagues conduct a within-study comparison that contrasts a randomized experiment and a quasi-experiment to test multiple strategies for covariate selection. Various covariate sets are used to adjust for initial group differences in the quasi-experiment, which is characterized by self-selection into treatment. The adjusted effect sizes are then compared to the experimental ones, so as to identify which individual covariates—and which conceptually grouped sets of covariates—are most capable of reducing bias in the quasi-experiment. The results provide strong clues for identifying the “best” covariates for studies in which the true selection process is not known. Cook co-authored the paper with Steiner, Shadish, and M. H. Clark of the University of Central Florida.
In related work, Cook, Steiner, and Shadish now turn their attention to the question of unreliability of covariate measurement and its effects on studies that employ propensity score methods. The researchers simulate different degrees of unreliability in multiple covariates that are used to estimate propensity scores in two previous studies. They find that measurement reliability only promotes bias reduction when the covariates are effective to begin with. Increasing or decreasing the reliability of covariates that do not, on their own, reduce selection bias makes no difference at all. The study was published in the Journal of Educational and Behavioral Statistics.
Correlated Random Coefficient Model
When researchers are unable to conduct a controlled experiment, instrumental variables (IV) offer one method for estimating causal relationships. The recent IV literature describes models in which agents sort into treatment status based on gains from treatment as well as on baseline pretreatment levels. Yet the observing economist might not know the components of the gains known and acted on by the agents. Such models are called correlated random coefficient models. Sorting on unobserved components of gains complicates the interpretation of IV estimates. In work with James Heckman of the University of Chicago, IPR economist Sergio Urzúa explores the implications of the hypothesis that agents do not sort into treatment based on gains. The economists develop new tests to gauge the empirical relevance of the correlated random coefficient model and assess whether the additional complications associated with it are required. They also examine the power of the proposed tests and derive a new representation of the variance of the IV estimator for the correlated random coefficient model.
More recently, with University of Chicago colleague Daniel Schmierer, Urzúa and Heckman have published supplements to their original research that offer additional power calculations and procedural descriptions, as well as a concrete example based on decisions to stop education after high school or continue onto college. Applying their methods to the problem of estimating returns to schooling, they find evidence of sorting into schooling based on unobserved components of gains.
Probabilistic Polls and Random Utility Models
When researchers cannot evaluate public or consumer preferences based on actual choice data, they sometimes poll respondents about the actions they would choose in hypothetical scenarios. Data on stated choices are then used to estimate random utility models, as if they were data on actual choices. However, stated and actual choices may differ, as researchers typically provide survey respondents with less information than they would have in similar, real-life scenarios. One way to overcome this problem is by eliciting choice probabilities, which permits respondents to express uncertainty about their behavior. In a recent article in the International Economic Review, IPR economist Charles F. Manski shows how to use elicited choice probabilities to estimate random utility models. He also reports estimates from a consumer case study in Israel that measured household preferences related to electricity reliability. Manski is Board of Trustees Professor of Economics.
Coping with Ambiguity in Policymaking
Social planners regularly face the difficult task of identifying the best policy with limited knowledge of policy impacts. Throughout 2010, Manski published several articles related to his line of original work on coping with ambiguity in policymaking. In a paper for the Proceedings of the National Academy of Sciences, he considers the problem of a social planner who must choose a vaccination policy. From the perspective of the external-response function, Manski demonstrates that the rate of vaccination will change the rate of illness among unvaccinated persons. In this scenario, a planner can expect the illness rate among unvaccinated persons to decrease at least somewhat as the vaccination rate increases—even without knowing the exact preventive effects of the vaccine. Finally, he explains how use of the minimax or minimax-regret criterion can help such a planner decide on a particular vaccination rate for maximum results.
Manski’s 2010 IPR working paper on “incredible certitude”, forthcoming in the Economic Journal, also addresses the problem of uncertainty in policy debates and develops a typology of poor analytical practices. Although it is well-known in research circles that stronger assumptions produce weaker conclusions, Manski argues that public and professional pressure to produce concrete policy recommendations often leads planners to make stronger assumptions than they can reasonably defend. He suggests that planners might better promote positive policy discussion by characterizing the plausibility of various situations and the relative effectiveness of differing policies to address them.
Collective Decisions in Social Choice
Most economists, operating in a utilitarian framework, assume that individual choice represents the most efficient means of distributing private goods. The few arguments that have been made in favor of consensus action have relied on a hypothetical economy of altruistic agents. But in new research supported by the National Science Foundation, Manski investigates the efficiency of collective decisions in social choice situations with uncertainty. He shows that, based on the mathematics of aggregating individual payoffs, coordinated action can improve outcomes. Specifically, the mean of all actions that agents would choose independently will produce a larger mean payoff for all individuals in all states of nature. A related conclusion, based on Jensen’s inequality, is that in this particular context, it is better for social planners to prescribe the same action to all individuals rather than to diversify treatment across the population. The study was published in Quantitative Economics.
Measuring Consumer Uncertainty
Survey measures of consumer expectations have important consequences for both government and business. For example, public uncertainty about future inflation matters not only for forecasting inflation and other macroeconomic outcomes, but also for assessing a central bank’s credibility and effectiveness of communication. Manski is currently exploring the feasibility of eliciting individual consumers’ subjective probability distributions of future inflation outcomes in a project with Wändi Bruine de Bruin of Carnegie Mellon University and Giorgio Top and Wilbert van der Klaauw of the Federal Reserve Bank of New York. The researchers are also looking at how these measures vary over time and their correlations with point forecasts and respondent characteristics.
In another project, Manski and his RAND Corporation colleague Jeff Dominitz are measuring expectations of equity returns among potential investors to gain a better understanding of the determinants of investment in equities and other assets. Analyzing data from the Survey
of Economic Expectations in 1999–2001 and the Michigan Survey of Consumers in 2002–2004, they find a population made up of “expectation types,” each forming expectations in a different but stable way. The study is forthcoming in the Journal of Applied Econometrics.
Forecasting for Areas of Human Capital
IPR statistician Bruce Spencer is working on estimates and forecasts for selected areas of human capital, such as those that categorize U.S. workers employed in science and technology jobs according to skill. Past studies of U.S. educational attainment have tended to focus on differences in averages across groups. This is consistent with most demographic research, which has focused on rates rather than totals. Total numbers of people with certain types of human capital are important for U.S. competitiveness, however. Thus, Spencer is developing a new model that allows for aging and retirement, international movement, and potential policy effects of improved incentives for attracting and training students. Having a framework for systematically organizing information about human capital could help U.S. policymakers both in tracking progress and in developing strategies to increase particular kinds of human capital. Spencer also hopes the statistics will be useful in discussions about the future of U.S. higher education, and, by extension, secondary, elementary, and even preschool education.
Statistical Accuracy and the Judiciary
Much of Spencer’s work also focuses on the accuracy of public statistics and the use of statistics to inform and improve social processes and systems. He recently completed the first stage of a project on the accuracy of jury verdicts, which indicated that juries give wrong verdicts in at least one out of eight cases—and judges’ performance was estimated to be not much better. This sample was specialized and not a basis for generalizations. Utilizing statistical models known as latent class models, he also estimated probabilities of erroneous conviction and erroneous acquittal. Spencer is currently developing methods for quantifying the uncertainty for these estimates to assess whether they can be used for large-scale monitoring of the accuracy of verdicts.
Methods and Empirical Legal Scholarship
Law professor and IPR associate Nancy Staudt is advancing the use of causal directed acyclic graphs in empirical legal studies to help researchers identify causation, avoid bias, and interpret empirical results. Although this methodology has become popular in a number of disciplines, including statistics, biostatistics, epidemiology, and computer science, it has yet to appear in the empirical law
literature. In a recent paper with
Harvard colleague Tyler VanderWeele, she outlines the rules and principles underlying this methodology and shows how it can assist legal empiricists through both hypothetical and real-world examples. Staudt is Class of 1940 Research Professor.
Hypothesis Development and Testing
Widespread misuse of null hypothesis significance testing (NHST) has led many communication researchers to critique this framework and instead emphasize the use of effect sizes and confidence intervals. Communication studies researcher and IPR associate Daniel O’Keefe points out that this new approach is often functionally equivalent to NHST in current studies because communication researchers are currently limited by crude predictive capabilities. His analysis suggests that developing more refined predictive capabilities will require making good use of the additional information provided by effect sizes and confidence intervals—and careful thinking about how such refined hypotheses might be tested. He is Owen L. Coon Professor of Argumentation and Debate.
Accounting for Missing Survey Data
Missing data is prevalent in social science and health studies, both in the form of attrition—in which responses “drop out” of the data set after a certain point—and in nonmonotone patterns of intermittently missing values. Yet even within these patterns, not all missing data can be treated equally; certain trends in missing data might indicate wider trends that should be taken into account when forming conclusions about the data set as a whole.
Marketing professor and IPR associate Yi Qian, in a project with Hui Xie of the University of Illinois at Chicago’s School of Public Health, makes the first attempt to systematically address the problem of not distinguishing between these ignorable and nonignorable patterns of missingness in missing data analysis, especially when the missing data appear to be irrelevant but cannot be confirmed as such. Their approach is to test whether the standard analysis holds under conditions of nonignorable nonresponse. If so, the analysis can be considered credible. If, however, the analysis changes significantly under the condition that missingness is at least somewhat relevant, the researchers conclude that it should
be re-examined.
Qian and Xie are also investigating the use of a generalized additive missing data model that, contrary to the existing literature, does not assume a restricted linear relationship between missing data and the potentially missing outcome. Using a Wage Offer data set, they conduct an extensive simulation study. Their simulation shows that the proposed method helps reduce bias that might arise from the misspecification of the functional forms of predictors in the missing data model.
Adjusting Survey Data for Poor Memories
The category adjustment model proposes that estimates of inexactly remembered stimuli are adjusted toward the central value of the category of which the stimuli are members. Adjusting estimates toward the average of all category instances, properly weighted for memory uncertainty, should maximize the average accuracy of estimates, but only in situations with symmetrical category distributions, such that the central stimulus value is also the mean. Hedges and his colleagues report two experiments using asymmetric distributions: one where the frequency distribution is skewed and the other where the frequency distribution shifts over time. In both cases, people adjust their estimates toward the category’s running mean, which is consistent with the model—but not with alternative explanations for adjusting stimuli toward a category’s central value. Published in the Psychonomic Bulletin & Review, the paper was co-authored by Sean Duffy of Rutgers, Janellen Huttenlocher of the University of Chicago, and Elizabeth Crawford of the University of Richmond.
Experiments in Political Science
IPR political scientist James Druckman is co-editor of the first comprehensive overview of how experimental research is transforming political science. Cambridge University Press will publish the “Handbook of Experimental Political Science” in 2011. The volume will offer methodological insights and groundbreaking research from 30 of the discipline’s leading experimentalists, including Druckman, Shanto Iyengar and Paul Sniderman of Stanford, Alan Gerber and Donald Green of Yale, and Diana Mutz of the University of Pennsylvania. The handbook aims to ensure that political science experiments are conducted with the highest level of intellectual rigor, thereby enabling political scientists to provide policymakers with significant data and conclusions. The volume came together following a May 2009 conference at Northwestern University, which Druckman organized with support from IPR and funding from the National Science Foundation.
Time-Sharing Experiments in Social Science Research
IPR sociologist Jeremy Freese and Penny Visser of the University of Chicago continue to expand the research capacity of Time-Sharing Experiments for the Social Sciences (TESS), a website that facilitates original experiments on nationally representative samples at no cost to investigators. Recently, TESS joined forces with the Human Factors/Behavioral Sciences Division of the Department of Homeland Security’s Science and Technology Directorate to encourage survey research related to terrorism and government countermeasures. Specifically, the partnership will allow social and behavioral scientists to investigate the factors contributing to terrorism-related attitudes, beliefs, judgments, and behaviors with a field study larger than normally permitted in a standard TESS proposal.
TESS was launched in 2008 as an infrastructure project of the National Science Foundation. Faculty, graduate students, and postdoctoral researchers can simply submit their proposals for peer review, and if successful, TESS then fields the Internet-based survey or experiment on a random sample of the U.S. population.
Data Center for Census Research
The Chicago Census Research Data Center is a collaborative effort between the U.S. Census Bureau and a consortium of Chicago-area institutions to provide researchers with access to confidential Census Bureau microdata. Launched in 2002 and housed at the Federal Reserve Bank of Chicago, it is one of only 12 such sites in the nation.
The center has recently added new data sets to its broad pool of census resources, including microdata on business establishments and firms, individual and household demographics, longitudinal employment dynamics, and even health data from partnering agencies such as the National Center for Healthcare Statistics. Spencer and Andrea Eisfeldt, associate professor of finance in the Kellogg School of Management, represent Northwestern on the board governing the consortium, whose current members also include the University of Chicago and University of Illinois.
Postdoctoral Training in Education Research
The Q-Center’s postdoctoral training program, led by Hedges and supported by a grant from IES, continues to provide unique, two-year fellowships in applied education research to recent PhDs. The goal of the program is to produce a new generation of education researchers who are equipped to tackle the pressing challenges of the U.S. education system with the best research methods available.
Currently, the center is hosting two postdoctoral fellows. Nathan Jones is studying the role of policy in shaping new teachers’ work environments, and Christopher Rhoads is working to develop better methods for dealing with missing data in experiments, procedures for evaluating measures of implementation fidelity, and ways to integrate measures of implementation fidelity into analyses of experiments.
Promoting a Methodological Network
The Society for Research on Educational Effectiveness (SREE) held its third annual conference in March 2010, bringing 385 researchers to Washington D.C. for three days of presentations, panels, and workshops around the theme of “Research into Practice.” Speakers included Cecilia Rouse, a member of the Council of Economic Advisers; Thomas McLellan, then deputy director of the White House Office of National Drug Control Policy; John Easton, director of the Institute of Education Sciences; and Anthony Bryk, president of the Carnegie Foundation for the Advancement of Teaching. Several IPR faculty, postdoctoral fellows, and graduate research assistants also presented their work, including Cook, Hedges, Rhoads, and IPR economist David Figlio.
Founded in 2005, SREE is a professional society that brings together scholars from a diversity of disciplines in the social sciences, behavioral sciences, and statistics who seek to advance and disseminate research on the causal effects of education interventions, practices, programs, and policies. IPR houses the society and its director, Rob Greenwald. Hedges also serves as the society’s president and as co-editor of its Journal of Research on Educational Effectiveness.
The Society for Research Synthesis Methodology, co-founded by Hedges and Cook, launched its own flagship journal in 2010, Research Synthesis Methods. In its first two issues, the journal covered a wide range of topics, such as statistical methods, empirical research and simulations, literature retrieval and information science, data extraction, and software. The journal’s scope is intentionally broad so as to appeal to researchers across many disciplines who face common methodological issues. Its first issue included an article by Hedges, IPR graduate research assistant Beth Tipton, and graduate student Matthew Johnson on estimating the covariance matrix of meta-regression coefficients that are applicable when there are clusters of internally correlated estimates. Tipton and Hedges have also developed related software for robust variance estimation, which is available for download on IPR’s website.
The journal’s second issue also featured an introduction to fixed-effect and random-effects models for meta-analysis, co-authored by Hedges and three of his longtime colleagues, Michael Borenstein of Biostat, Julian Higgins of the U.K. Medical Research Council, and Hannah Rothstein of Baruch College. These four experts on meta-analysis are currently at work on a companion volume to their 2009 Introduction to Meta-Analysis (Wiley). The new book will focus on computing effect sizes for meta-analysis.
Q-Center Colloquia
The Q-Center kick-started another round of colloquia in 2010 with Arizona State psychology professor Stephen West, who spoke on research design and causal inference, comparing the traditional approaches developed by Donald Campbell and Donald Rubin. Economist Peter Schochet of Mathematica Policy Research also gave a talk on the late pretest problem in randomized controlled trials in education. The series continues in 2011 with an interdisciplinary line-up of methodological experts.
|