Making every research step count: Introducing the ATAI Data Portal
This piece was originally published by the Center for Effective Global Action (CEGA).
A new open data platform will accelerate robust and comprehensive research in the agricultural sector
Since 2009, the Agricultural Technology Adoption Initiative (ATAI), co-managed by CEGA and J-PAL, has generated robust evidence of the impacts of agricultural technologies, such as stress-tolerant rice or mobile-phone based agricultural extension, on small-scale farmer welfare. Today, ATAI launched a new open data platform to bring together the best evidence from ATAI-funded research in a single portal, making it easily accessible to researchers and policymakers alike. The initiative aims to foster collaboration and evidence-informed decision-making in the agricultural sector, ultimately contributing to the advancement of the most effective agricultural practices and improving farmer welfare.
Why make data open?
Access to high-quality data has long been recognized as a significant obstacle in social science research. To address this issue, data repositories like the J-PAL Dataverse have emerged, making it easier for researchers, policymakers, and others to access and utilize data from completed research studies. In recent times, the effectiveness of these data repositories has been bolstered by data sharing policies put into place by funders, journals, and research organizations. UC Berkeley's Initiative for Transparency in the Social Sciences (BITSS)—incubated at CEGA—champions these and other open data approaches as a standard practice that promotes transparency and reproducibility of evidence, strengthening the scientific ecosystem and bolstering the credibility of research findings.
The ATAI Data Portal goes beyond the principles of open data by incorporating data harmonization. Data harmonization involves the collection of data from various sources or, in the case of ATAI, a research portfolio, in a manner that ensures users have a comprehensive and comparable view of the information.
Harmonized data holds tremendous value for researchers aiming to extract insights from multiple studies. In the past, researchers had to collect datasets from various sources, investing valuable time in cleaning and integrating the data. Often, the unavailability of raw data hindered such comparisons, and the resulting publicly available data lacked sufficient information for meaningful analyses. However, researchers now have a powerful tool at their disposal. With the ATAI Data Portal, they can access harmonized data, enabling them to conduct meta-analyses and explore the external validity and generalizability of research results more efficiently and effectively. This transformative platform opens up new avenues for robust and comprehensive research in the agricultural sector.
The ATAI Data Portal also improves the richness and quality of datasets from ATAI-funded projects in several ways. For instance, a number of ATAI-funded studies contain geo-referencing, or latitude and longitude coordinates for agricultural fields, households, or study administrative boundaries. When geographic coordinates are available, the ATAI Data Portal overlays the project dataset with environmental variables—such as temperature, precipitation, night lights, and forest cover–-to expand the richness and utility of the data. (Many predictive models rely on this kind of information as ground truth data).
To maintain the anonymity of the surveyed population, the data linkage employs industry-standard geo-masking techniques. By implementing these measures, the ATAI Data portal ensures that the privacy and confidentiality of the participants are preserved while providing valuable insights into the relationships between agricultural practices and environmental factors.
During the data harmonization process, meticulous data cleaning is carried out to ensure data integrity. This includes harmonizing units, eliminating negative values, and removing duplicate records as part of the harmonization effort. These measures contribute to the overall reliability and consistency of the data made available through the ATAI Data portal, fostering more robust and trustworthy research outcomes.
Thus, the ATAI Data Portal offers a novel approach in that it features high-quality, harmonized data integrated with environmental variables in an open and accessible format.
“This portal is a first step in an effort to allow datasets from randomized controlled trials to be put to a broader set of uses. By harmonizing core agricultural variables to the fullest extent possible as well as providing broad access to raw data, the portal will allow the research community to aggregate across studies and geographies in a way not possible in any single study.”
- Craig McIntosh, ATAI Co-Chair and Professor of Economics at UCSD
ATAI-data.org launched with seventeen datasets based in Bangladesh, Ghana, Ethiopia, India, Kenya, Mozambique, Uganda, and Zambia. The portal will continue to grow as more research teams complete and submit their datasets to ATAI.
What comes next?
The ATAI Data Portal is a public good that will increase in volume and value over time as more open datasets from ATAI become available and more researchers make use of it. The ATAI Data Portal is open-source and freely available.
ATAI has seized an opportunity to institutionalize harmonized, open data and further standardize data collection for agricultural randomized evaluations—making every research step count. We hope that this model is an encouraging approach and tool for researchers working to evaluate the effectiveness of agricultural development programs.
For more information and for portal documentation, please visit atai-data.org.