Rescuing historic datasets at risk
Despite a growing trend in publishing new research outputs in FAIR and open repositories, the body of scientific knowledge remains dominated by products that are not accessible or re-useable. Developments in policy and best practices around data management help, but they are not retroactive.
Historical and long-term environmental datasets are imperative to understanding how natural systems respond to our changing world. Although immensely valuable, these data are at risk of being lost unless actively curated and archived in data repositories. The practice of data rescue, which includes identifying, preserving and sharing valuable data and associated metadata at risk of loss, is an important means of ensuring the long-term viability and accessibility of such datasets.
The Living Data Project () is a nationwide initiative to preserve and breathe new life into legacy datasets in ecology, evolution and environmental science. It also trains graduate students in best practices in data management, reproducible research, synthesis statistics and scientific collaboration. The project is a national collaboration that brings together academics, graduate students and project partners at a variety of government and non-government organizations across Canada.
While rescuing data is not new, the term lacks formal definition and general recommendations. The project group outlined () seven key considerations for the effective rescue of historically collected and unmanaged datasets, based on experiences organizing data rescue efforts through the CIEE Living Data Project. In practice, the group considers prioritization of datasets to rescue, forming effective data rescue teams, preparing the data and associated metadata, and archiving and sharing the rescued materials.
The application of data rescue is scalable to other organizations, institutions and disciplines. Regardless of the discipline or data type, the need to appropriately curate and archive data is universal in the practice of science. The data rescue process also creates opportunities to decolonize the practice of science and broaden the scope of meaningful relationships between data collectors, researchers and the local or Indigenous communities in the places the data are collected.
In principle, data rescue is a zero- or very-low-cost practice, aside from the personnel time required to complete the project. In the Living Data Project, graduate students participate through paid internships ($6,000 CAD). By comparison, the scientific, economic and cultural costs of data loss can be immense.
The Living Data Project has conducted nearly 70 data rescue projects, involving dozens of graduate student interns and project partners from a diverse range of local, provincial and national organizations. Successful data rescue means that the targeted data, and associated metadata, are made openly and permanently available on a public data repository.
Contributed by Joseph Burant, former Postdoctoral Teaching & Research Fellow with the Living Data Project, Canadian Institute of Ecology and Evolution.
- Fostering a culture of open science and aligning incentives for open science
- Investing in human resources, training, education, digital literacy and capacity building for open science
- Investing in open science infrastructures and services
- Promoting innovative approaches for open science at different stages of the scientific process
Living Data Project