France – at the forefront of open science
The CNRS's sixth Open Science Day held on November 27th 2024 was an opportunity to take stock of progress and prospects in this area. The day's agenda included an overview of national-level data storage and intensive computing infrastructures and discussion of the CNRS's involvement in the European federation EOSC, the international overhaul of research assessment criteria and issues linked to open bibliographic databases.
Following previous such days focusing on scientific publications in 2020, research evaluation in 2021, research data in 2022 and open source software and text mining in 2023, what would the sixth Open Science Day organised on November 27th 2024 by the CNRS's Open Research Data Department (DDOR) have to offer? A little of all of that and more. Five years after the launch of the CNRS’s Roadmap for Open Science, the organisation, which is a forerunner of the movement in France, took stock of progress and prospects in this field on the occasion of this year's event.
Strengthening, pooling and making high-performance computing sustainable
First among such prospects are those offered by intensive computing. The deputy director of the DDOR, Denis Veynante reiterates the fact that the CNRS is "a major player in the high-performance computing and data landscape thanks to the two national-scale data centres we run" - the IN2P3's Computing Centre1 and the Institute for Development and Resources in Scientific Computing2 in Orsay which operates the Jean-Zay supercomputer dedicated to artificial intelligence. As the deputy director points out, the CNRS "strongly supports the rationalisation of hardware infrastructures fostered by the Ministry of Higher Education and Research and the Cour des Comptes3 to avoid scattering our efforts and building individual ad hoc solutions". In compliance with this policy, two new large-scale projects should be launched in the coming years. The first is a joint project involving the Idris and the research infrastructures Data Terra4 and France Grilles5 in rolling out new-generation services for storing, processing and making massive data available. The aim for this project is to interconnect storage infrastructures like Idris, the mesocentre in Clermont-Ferrand and the Strasbourg Astronomical Data Centre and extend to include other mesocentres, thus ensuring seamless integration with the existing ecosystem of cloud stakeholders. The project is aiming high – to be "the first unified sovereign package of computing infrastructures and cloud services", explains the director of Idris, Pierre-François Lavallée. In practical terms this means Idris and France-Grilles acquiring additional storage capacity in 2025. This will be financed in part by a €2 million seed fund from the Directorate-General for Research and Innovation (DGRI) and by €500,000 saved by the CNRS unsubscribing from Scopus at the start of 2024. The aim of the second project, FITS, is to "federate the 'savoir faire' and services of Idris and the CC-IN2P3 – while respecting the missions of both – through the implementation of a distributed research data storage, processing, access, distribution and promotion infrastructure hosted in environmental friendly conditions with a low carbon footprint", explains Pierre-Etienne Macchi, the CC-IN2P3's director. In concrete terms, the two centres will have upgraded their respective hosting capacities by 2026 to be able to cope with "the explosion in the volume of data from research infrastructures".
- 1The CNRS Nuclei & Particles Computing Centre (CC-IN2P3), in Villeurbanne, is a national research infrastructure that designs and runs a range of services, particularly including a mass storage system and processing facilities for large amounts of data.
- 2The Idris, in Orsay, is the CNRS's major very high-performance intensive digital computing centre. It operates the Jean-Zay supercomputer, part of which is dedicated to the artificial intelligence research community.
- 3France's Court of Auditors.
- 4The Data Terra research infrastructure enables researchers to access, process and combine multi-source data for the observation of the Earth system.
- 5The France-Grilles infrastructures are a set of machines (hardware) on which scientific data processing services and software are rolled out.
Working towards open bibliometric databases
As well as providing part of the financing for the new Idris service, the CNRS's cancellation of its Scopus subscription will help support its full transition to an open, non-commercial model, a point reiterated by Antoine Petit, the CNRS Chairman and CEO, in his speech – "We will eventually need to stop using commercial databases for bibliometrics and bibliography". In the meantime the CNRS has maintained its subscription to Clarivate's Web of Science database while free bibliographic databases are being developed like the open access not-for-profit solution OpenAlex. As a complement to OpenAlex, scientists can use the Matilda platform to run bibliographic searches on full text versions rather than just the metadata (title, author, keywords, abstract) for publications. In fact, Matilda's full text provision can also be used to "identify suspicious traces of the use of generative AI and unexpected citations of these texts", explains Didier Torny, a CNRS research professor with the Centre for the Sociology of Innovation1 and the DDOR's scientific publications economics officer.
The overhaul of research assessment
This transition to open access is being accompanied by an overhaul of research assessment in France and internationally. For France, Alain Schuhl, the CNRS's Deputy CEO for Science, recalls the four principles put in place when the assessment of researchers at the CNRS was overhauled in 2019. The founding idea is for the evaluation process to focus on scientific results and give greater recognition to the fully diverse range of activities involved in researchers' careers rather than solely concentrating on bibliometric indicators and articles published in prestige journals with ever-increasing publication costs. The aim of overhauling the research assessment process is not solely limited to scientists' careers as "guaranteeing high-quality assessment is now necessary to maintain the level of excellence of French research", explains Alain Schuhl.
For this reason, the CNRS is fully involved in the international CoARA coalition2 launched in 2022 which now boasts over 700 members, 13 international working groups and 15 national chapters. The CNRS was one of the first signatories and now the DDOR is leading a work package for the analysis of CoARA signatories' action plans. Sylvie Rousset, the director of the DDOR, represents the CNRS in CoARA's executive committee. She sees the coalition as "a place for discussion and exchange which clearly shows that reforming research assessment is an international concern nowadays". Sylvie Rousset reiterates how important collective action is in supporting this process. "The strength of the collective and the involvement of academic players around the world will enable us to rethink the whole long-term assessment system with the most integrity possible".
EOSC gathers its forces within a federation
Another international collaboration of note is on the European scale. The European Open Science Cloud (EOSC) programme offers a catalogue of shared services for open science scientists from all disciplines. Nearly ten years after the 2016 launch of the EOSC by the European Commission, the programme is now gathering its forces together within a new tangible sustainable environment – the EOSC federation. Volker Beckmann is responsible for implementing the EOSC in France. He explains how this new approach came about. "Until now, the EOSC operated on a project basis with a new call to manage the core of EOSC every three years, all of which limited its durability and sustainability. This way of operating actually weakened the service offer. This new federation of research data and service providers will be based on a solid economic model as regards support and will drive a dynamic that fits with the ambitions of European research. This does not mean the end of calls for projects to continue to develop the EOSC but these will be run in a context of clear and well-identified governance and operations".
The EOSC federation was only announced this year and has already received 121 applications, including 17 from France. The CNRS is fully involved in this process, aiming to contribute to various nodes of the federation such as composite structures articulated into several services provided for the scientific community. The organisation also hopes to coordinate three of these nodes by involving Data Terra for the Earth system, Escape for astronomy and particle physics and HAL+ for open archives. Suzanne Dumouchel, the CNRS's coordinator for EOSC, explains that the organisation "is heavily involved in the construction of the EOSC federation at several levels", providing support for the nodes and taking part in the work of the EOSC association.
France tops the Leiden Ranking
At the Open Science Day, André Brasil, a researcher at Leiden University's Centre for Science and Technology Studies, shared the progress made in tweaking the international Leiden ranking system to incorporate open science principles. The Leiden Ranking is the first such list to use only open data – notably via the OpenAlex database – and will coexist with the historical ranking list based on proprietary data so their respective results can be compared. The availability and centralisation of open data and metadata is a recent process that is being organised gradually at the international level.
Six years after the launch of the first National Plan for Open Science, France clearly plays a major international role as regards open science practices as André Brasil explains when analysing preprints. "If we look at OpenAlex's list of preprints, clearly France is leading the way ahead of other countries and the CNRS is nothing less than the number 1 institution in the world".