Data Discovery, Preservation, Tracking
Where is that data (now)?
Whether intentionally orchestrated by political actors or due to a shortsighted technical mishap, the loss of important research data is unfortunately not a new phenomenon. Recent expunging and “alteration” of U.S. Federal Agency data to accommodate a political viewpoint demonstrate the vulnerability of our most trusted information resources. Such actions erode scholarly provenance, interrupt current research, impede sharing efforts, and stifle future innovation.
Our goals are to raise awareness of the need for public data access and preservation, to involve subject experts in identifying at-risk data, encourage personal protection of research data, and support research reproducibility.
While it is impossible to hope that all data will be preserved in some form for ongoing research, there are several efforts underway that are capturing important data and online information. We encourage you to review, and potentially participate in, the ongoing community efforts to preserve crucial data.
If you have any questions or encounter lost data sources, please contact the Publishing and Data Services team.
Community Efforts
Looking for data? Want to contribute data to save? Below are relevant community efforts to help locate previously public data.
- Data Rescue Project – The Data Rescue Project is a coordinated effort among a group of data organizations with efforts for data gathering, data curation and cleaning, data cataloging, and providing sustained access and distribution of data assets.
- Data Rescue Tracker – The tracker provides an overview of what data has been downloaded from which government websites. If you are looking for a specific dataset, use the search or filter features to see if it has been captured.
- Data Liberation Project – The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.
- DataLumos – DataLumos is an ICPSR archive for valuable government data resources.
- Find Lost Data – Find Lost Data provides a search tool across several data archive/rescue sites, including CDC, Harvard Dataverse, Data Rescue Project, and Harvard LiL Data.Gov mirror.
- Public Environmental Data Partners – A volunteer coalition committed to preserving and providing public access to federal environmental data.
- CAFE Dataverse Collection – This Climate and Health Research Coordinating Center (CAFE) Dataverse sub-collection stores critical climate and health datasets.
- Harvard Law School Library Innovation Lab Data.gov Archive – A complete archive of federal public datasets linked by data.gov, with 311,000 datasets harvested during 2024 and 2025.
Documenting Data Preservation
Remember, just because you found data to track, saving it is not enough. Use the resources below to ensue the data is in accessible formats, and well-described with metadata to facilitate long-term access and reusability.
- Curating for Data Rescue – Data Curation Network (curators making data more ethical, reusable, and understandable) advice on preserving data.
- Checklist for USA Federal Data Backups – Checklist from MIT provides steps you can take to ensure the government data you use in your research remains accessible to you and others.
Repositories for Data Preservation
Data repositories are a centralized place to hold data, share data publicly, and organize data in a logical manner. There are many established repositories for data discovery and preservation. Harvard supports three generalist repositories to ensure long-term access to your research data:
- Harvard Dataverse is Harvard’s open data repository for sharing, preserving, citing, exploring, and analyzing discrete research datasets.
- Open Science Framework (OSF) can be used to collaborate, manage, and share your documents, datasets, and research throughout an entire project.
- Vivli allows for the sharing, request, and secure analysis of individual participant-level data from completed clinical trials.
Additional Guidance for Data Discovery
Use the following resources to locate data collections maintained by Harvard or other entities. Have questions? We are here to help!
- Harvard Library Research Guides: Statistics & Data
- Harvard Library Research Guide: Essential Resources for Locating and Using Numeric Data
- Harvard Library Research Guide: Health Data Resources, United States
- Lamont Library: Data and Government Information Collections
- Krieger Research Group: Public Health Disparities Geocoding Project
Data Discovery and Preservation Events
Presentation slides from previous Countway Library classes on data discovery and preservation.
- April 2025 Community Data Preservation: A Climate & Health Datathon
- December 2024 Research Data Management Webinar: Data Sharing in Repositories
- November 2024 Research Data Management Webinar: Principles of Finding and Citing Data