Skip to main content

Data Use, Quality & Cost in Policy Research

How Schools Are Chosen for Research 

tipton.jpg
IPR statistician Elizabeth Tipton lectures on how to design studies with generalizability.

As education policymakers increasingly rely on evidence, researchers have conducted more cluster-randomized trials. But much work remains be done in reviewing those trials’ methodology, specifically how researchers choose which schools to include as what works for one school or district may not work for another. In a recent working paper, IPR statistician Beth Tipton and her co-authors examine 34 such trials to determine whether their samples are truly representative of particular populations of schools and students. They compare the sample data from those studies, funded by the Institute of Education Sciences and selected and evaluated by Tipton and her co-authors, to general population data from the U.S. Department of Education. They find that recruitment for studies is heavily dependent on pre-existing local relationships between researchers and the schools that are ultimately selected. Therefore, those schools skew, like the universities, larger in size and more urban than the population being studied. They find this poses major challenges to any generalizations drawn from such studies. The researchers recommend three major changes to recruitment: including the sample-collection methodology in the grant proposal, increasing training for sample collection, and establishing best practices for school recruitment.

Economics of Scaling Up

When researchers conduct randomized controlled trials (RCTs) of social programs, the hope is that smaller-scaled programs that appear promising in initial RCTs can then be implemented at a larger scale with a larger set of resources. But how can it be known whether a social program will be equally successful at scale without having been tested at scale? IPR economist Jonathan Guryan proposes a way to measure the economics of scaling up in an IPR working paper, along with Jonathan Davis, Kelly Hallberg, and Jens Ludwig of the University of Chicago. Their model focuses on one scale-up challenge specifically: For most social programs, labor is a key input, yet as social programs scale up and seek new hires, a search problem can arise. This results in a situation where, as programs are scaled, either the average cost of labor must go up or program quality will decline at fixed costs. For instance, a tutoring program that is being scaled up will eventually face a labor supply problem where the program will either need to start offering higher pay to attract high-quality tutors, or will have to accept lower-quality tutors at lower pay. While acknowledging that exact costs of scale-up cannot necessarily be known, Guryan and his co-authors show that it is possible to create and test a program at a smaller scale while still learning about the input supply curves, such as the labor supply curve, facing the firm at a much larger scale. This can be done by randomly sampling the inputs the provider would have hired if they operated at a larger scale. Highlighting the specific case of a Chicago tutoring program they are evaluating and trying to scale up, the researchers show how scale-up experiments can provide insights and confidence that cannot be derived from simply scaling up without an experiment at a modest scale. Guryan is the Lawyer Taylor Professor of Human Development and Social Policy.

Protecting Privacy of Student Data

Over the last decade, the U.S. Department of Education has invested more than $600 million to help states improve their educational longitudinal data systems (SLDSs), including student background data and assessment data. SLDSs today include longitudinal data on millions of students, which should make them a rich source of data for education researchers. However, the Family Education Rights and Privacy Act (FERPA) has placed restrictions on accessing this data, making it difficult for independent researchers to use it. A research project led by Hedges, and supported by the NSF and IES, continues to examine the balance between protecting privacy and allowing for effective education research. A previous study led by IPR statistician and education researcher Larry Hedges, which used SLDS data from 11 states and 5 million students, found substantially different results for analyses using masked data, which scrambles or hides some original data to protect sensitive information, versus unmasked data. Subsequent analyses have shown that the data masking procedures used in several states have led to the deletion of a large, non-random portion of the SLDS data, which could have implications for research findings based on these datasets. Hedges’ current project evaluates several approaches to statistical disclosure control that could both make SLDS available to researchers and comply with standards set out by FERPA. He and his collaborators argue that these approaches must meet three criteria: 1) they must assure confidentiality; 2) they must preserve information about relationships in the data; and 3) the protected data must be open to conventional analyses by social science researchers. As a next step, the researchers will evaluate how well two competing general approaches meet the above criteria. Hedges is Board of Trustees Professor of Statistics and Education and Social Policy and the co-director of the STEPP Center.