Harnessing Big Data for Policy Research
IPR-led initiative to unlock possibilities for new administrative data sets
Big data sets, often talked about in engineering, business, and technology applications, are seen as an opportunity to revolutionize how we extract and use information to build new knowledge. Such massive data sets are set to transform policy research as well, and IPR has a new initiative to help the policy community harness them.
Supported by the National Science Foundation, the project will launch a national network of scholars, policymakers, and administrators to encourage successful cooperation in establishing big data sets for education research. IPR Director and education economist David Figlio is leading the project.
“By documenting best practices and identifying viable population models, we hope to inspire other states, researchers, and practitioners to collaborate on building and using such data sets—and to highlight why they are important,” Figlio said.
Establishing the Network
The project teams IPR scholars with colleagues at Duke University to examine construction of “next-generation” data sets, that link administrative data, such as welfare and school records, to population data, such as births and deaths. These sets could permit quicker evaluations of early childhood investments and interventions. The researchers aim to create a prototype using data from North Carolina and Florida, states that already have large-scale matched data sets.
Creating such a comprehensive data set requires close collaboration between scholars, policymakers, and data administrators at many levels of government. Thus, the network’s advisory board will ensure understanding of the issues faced by policymakers. Its members include three former governors, two state education superintendents, and the first director of the Institute of Education Sciences.
The other critical element to making effective use of such data sets is cross-disciplinary knowledge and expertise. At IPR, the network will involve economists Jonathan Guryan and Diane Whitmore Schanzenbach, social demographer Quincy Thomas Stewart, developmental psychobiologist Emma Adam, and biological anthropologists Christopher Kuzawa and Thomas McDade, in addition to Figlio. All have experience using large, national and international data sets, and Adam, Kuzawa, and McDade are leaders in integrating biomarkers with population-level data. At Duke, leading education and policy researchers like Jacob Vigdor and Kenneth Dodge will be involved.
The network will host a workshop organized by IPR for fall 2013 that will include Duke and Northwestern researchers, scholars from a variety of backgrounds from institutions around the country, and practitioners in state education departments. Its advisory committee of policymakers will also participate. This workshop is designed to provide a springboard for national conversations and increased collaboration between the research and practitioner communities.
Why Use Big Data?
As the architect behind the nation’s first large-scale data set matching birth-and-education records, Figlio spent nine years piecing together birth, education, and early childhood program participation data for more than 2 million children born in Florida. So he intimately understands the challenges of knitting large data sets together, in addition to leveraging government contacts to gain access to them in the first place.
Such data could be invaluable for peering into and understanding the black box of adult outcomes. Much is known about how early childhood interventions have positive affects on children from birth to age 5, then stabilizing in early adulthood. But what about after age 6 or the teenage years?
“There are times when disadvantage seems more malleable,” Figlio said, “so we can use these data to pinpoint exactly when interventions are more—and less—likely to work.”
Such big data sets will permit other types of new questions to be answered. Currently, the gold standard regarding many questions about the effects of early life disadvantage involves studies of twin pairs—for example, do twins separated at birth develop the same IQs? But Figlio points to how few pairs exist in even large, nationally representative data sets like the Panel Study of Income Dynamics. In a data set with 20,000 individuals, there are perhaps 100 twin pairs in all. So it is typically impossible to study how issues like biological disadvantage differentially affects children from a variety of backgrounds.
“These next-generation data sets will allow us to focus on the critical impact of policies on a person’s health, wealth, well-being, and employment over his or her lifetime,” Figlio said. “We hope to motivate others to do the same.”
IPR Director David Figlio is Orrington Lunt Professor of Education and Social Policy and of Economics at Northwestern University.