Archiving open software as human heritage


Access to the source code of research software is essential for open science: as a product of human creativity, software contains a growing part of scientific and technical knowledge. Archiving and referencing research source code is also an essential condition for the reproducibility of research results in all fields of study.

Software Heritage () has the mission to collect, preserve and make accessible the source code of all software that is publicly available (Abramatic et al. 2018). An international non-profit initiative led by Inria (the French national research institute for digital science and technology) in partnership with UNESCO, Software Heritage provides an infrastructure shared between research, industry and public administrations which makes it possible to pool costs, avoid the dispersion of efforts and standardize user training.

Software Heritage is based on a cost-sharing model, at several million euros per year, and is supported by the contributions of a network of international actors. For example, France participates through the national Fund for Open Science and the contribution of several research organizations and universities. A national open science prize for research software was launched in France in 2021. With its second national plan for open science, France is encouraging the distribution of source codes for research software under an open source licence, which allows unhindered reuse, and recognition of the contributions to the development of quality research software, in all their forms, as part of the career evaluation of researchers and engineers. The collaboration between Software Heritage and the open archive of HAL publications in France allows researchers and engineers to contribute with the least effort to the construction of a catalogue of research software production, equipped with quality metadata.

The Software Heritage archive contains over 16 billion unique source files, drawn from more than 250 million distinct sources as of August 2023. This includes publicly accessible projects on the most well-known forges but also the long tail of platforms maintained and used by research organizations (). Software Heritage provides the SWHID intrinsic persistent identifier, specifically designed for software, for the over 30 billion software artifacts contained in the archive, at all levels of granularity. The SWHID specification is openly maintained at and is a key building block for reproducibility and long-term accessibility (Di Cosmo et al. 2020).

Simple actionable guidelines are available at for researchers worldwide to archive and reference software source code that they use or produce. This includes the possibility to explicitly request the archiving of a software project at (more than 600,000 requests have been made since the service opened in 2019) and to request the archiving of an entire software forge (more than 100 requests have been made since the service opened in 2023).

Area(s) of action covered by the practice as per the UNESCO Recommendation on Open Science
Страна
Open Science Pillars
For more information, please contact:

Laurent Romary (Inria) or Roberto Di Cosmo (Software Heritage)