Pooling computing resources to progress further together at Paris-Saclay University

Corporate

Since 2017, CentraleSupélec, ENS Paris-Saclay and Paris-Saclay University have pooled their efforts to buy and host scientific computational resources at the CNRS's Institute for Development and Resources in Intensive Scientific Computing (Idris). The twin aim is to reduce the environmental footprint and financial costs of data centres and also to foster synergies between laboratories.

The digital economy's impact on the environment never stops growing, currently accounting for 6% to 10% of the world's electricity consumption, nearly 4% of our greenhouse gas emissions with an annual increase of between 5 and 7%. Data centres play a central role in storing the billions of data produced per second and consume a large proportion of the sector's energy.

The scientific demand for computer calculation resources is ever multiplying. This is one of the reasons research stakeholders on the Saclay plateau implemented a practical solution ten years ago to limit their operating budgets and the environmental impact of their machines. These machines are pooled and managed by CNRS's Institute for Development and Resources in Intensive Scientific Computing (Idris).

An alliance of three partners

A combination of events started the ball rolling in the early 2010s on this plateau in the south-west of the Île-de-France region. The Mechanics and Technology Laboratory (LMT)1 in Cachan and the Mechanics of Soils, Structures and Materials Laboratory (MSSMat)2 in Châtenay-Malabry used to work jointly on mesocentres3 installed on their premises and connected by a high-speed network. The two laboratories decided to purchase a new machine together in the framework of their upcoming merger which led to the creation of the Paris-Saclay Mechanics Laboratory (LMPS)4 and to the ENS's move to Saclay. Meanwhile in nearby Châtenay-Malabry, two École Centrale Paris laboratories, the MSSMat and the Macroscopic Molecular Energy and Combustion Laboratory (EM2C) were thinking about purchasing their first scientific computational machine when the merger of Centrale Paris and Supélec took place. The combined influence of the LMT, the MSSMat and the LMPS led to CentraleSupélec and ENS Paris-Saclay – which had become neighbours – jointly applying for and being granted funding for computing resources in the framework of the 2015-2020 State-Region Plan Contract. This was used to purchase Fusion, the two institutions' first joint computing machine. Rather than building a dedicated room to host it in Châtenay-Malabry, the two institutions turned to the CNRS through the Idris which the IT community already knew very well.

Denis Girou was the director of the Idris at the time and fondly remembers the institutional excitement about this kind of project in 2013 when it was presented. "There was an extremely favourable national context for this kind of partnership because the Ministry had decided to curb the anarchic multiplication of computer rooms in higher education and research. Most of these were in poor technical condition so the Ministry chose to push pooling operations very strongly". At the same time the CNRS's management board was urging CNRS units to ''integrate the regional ecosystem and in this case Paris-Saclay University which is currently being created". And so, after four years of tripartite negotiations, an agreement was signed in early 2017 by the three institutions which led to the official inauguration of Fusion on February 23rd of the same year. A new, more powerful machine called Ruche that was also supported by the Region was to replace Fusion four years later. In the meantime, the shared Fusion/Ruche platform which the Saclay AI and Cloud@VirtualData platforms were added to what became the Paris-Saclay mesocentre at the university of the same name.

Pooling resources rapidly proved to bring economic and energy benefits. The Idris is reputed for its eco-responsibility as was recently illustrated by the project to recycle waste heat from its Jean Zay supercomputer to heat the equivalent of a thousand new homes. The current director Pierre-François Lavallée is clearly proud of the room's "Power Usage Effectiveness5 of 1.28 compared to other laboratory machine rooms' figures of 1.8 or even 2". In practice, "if this equipment were located anywhere else than at the Idris, its electricity consumption would go up by 40 %". Hosting Fusion and then Ruche at the Idris quickly reduced the associated laboratories' budgets with LMPS alone estimated to have "divided the cost of hosting by three" as its director Pierre-Alain Boucard points out. This professor at ENS Paris-Saclay is also the mesocentre's scientific director for the ENS and declares himself impressed by the system's "intelligent energy management", particularly the job manager automatically switching off unused resources until computation requests are made and they are reactivated. This came in particularly useful when electricity prices soared in winter 2022. Above all, the use of the shared platform by laboratories is optimised which is essential given the environmental cost of manufacturing the resources which amounts to three quarters of its carbon footprint. Similar sorts of machines in laboratories are often underused but Ruche has a load rate of 60 to 70 % of its capacity. 

 

La machine de calcul Ruche
The computing machine Ruche. ® Rafael Medeiros / Idris

 

 "A mesocentre isn't just the hardware"

Beyond the economies of scale that have been achieved, pooling computing machines at the Idris has benefited the scientific communities themselves, first and foremost through the positive impact on working conditions. The Idris's effective control of energy consumption is partly due to the robustness of its electrical infrastructure. Its two power lines, inverters, batteries and generator help prevent power cuts while cold water canisters are available for use so equipment can be stopped as effectively as possible if an incident does occur. Pierre-François Lavallée is delighted that his unit can "pass on the very high resilience level of the national centre to all our partners". This resilience is both material and human as Anne-Sophie Mouronval is very well aware. This research engineer has worked within the structure for twenty years, first at the MSSMat and then at the LMPS. She was present for all the mergers and pooling processes within the laboratory and now devotes a third of her working time on technical support for Ruche users, whatever institution they are affiliated to. She works on this support function along with eight other engineers from ENS Paris-Saclay, CentraleSupélec, Paris-Saclay University and the Maison de la Simulation. The latter joint laboratory's own computer was already hosted by the Idris. When it came to the end of its useful life the Maison's management preferred to contribute to the extension of Ruche rather than building its own new machine. As well as the Idris's "optimal environment for computing resources" the number and diversity of its engineers' profiles are also impressive. "The wealth of skills in the support team is a real strong point for our pooling system. The team is capable of helping users with simulations that use artificial intelligence and also with high-performance computing (HPC)6 ".

This diversity of profiles testifies to the synergies created between a myriad of laboratories thanks to pooling. Some laboratories were more used to scientific computing, having been immersed in it since Fusion's early days, but other smaller laboratories that were less accustomed to working in this way for research purposes (certain humanities and social sciences units, for example) ended up joining the mesocentre. After just over ten years of existence Pierre-Alain Boucard admires the centre's "diversity of uses and scientific cultures which encourages us to exchange ideas to help mentalities and practices evolve and to keep calculations running to ensure the most effective use of the machine". Thomas Schmitt, a CNRS researcher working at the EM2C and CentraleSupélec's scientific manager for the mesocentre considers that "a mesocentre is not just the hardware. It also involves bringing people, knowledge and skills together which isn't possible when you're on your own". The mesocentre's versatility and flexibility for users have helped established its intermediary positions between laboratories and national supercomputers like Jean Zay or even European supercomputers. CentraleSupélec's scientific manager actually explains that the mesocentre can serve as a 'gateway' or even a 'springboard' towards such supercomputers.

 

Tuyaux du système de refroidissement du supercalculateur Jean Zay
Pipes for the cooling system on the Jean Zay supercomputer. ® Rafael Medeiros / Idris / CNRS Images

 

"Advancing together to advance further"

Nearly ten years after Fusion's birth what then is the future for pooling computational resources on the Saclay plateau? In the short term, the mesocentre is planning to pool its massive data storage systems but Thomas Schmitt remains cautious as regards future developments: "We're not aiming to grow at all costs, we'd like to maintain a complete global system capable of responding to several demands at the same time". Although the Idris mesocentre will soon be reaching the limit of the centre's technical infrastructure for power supply and air conditioning to accommodate new machines, a major project to upgrade these technical infrastructures began in 2022 and should be completed by mid-2026. Pierre-François Lavallée considers the experience of pooling resources to have been so successful that he aims to take it further. His idea is to "develop another national mission alongside our national supercomputer work – hosting machines or services for the entire higher education and research sector". Until now the mesocentre has been of secondary importance behind Jean Zay. However this new mission is in the process of being officialised with the Idris planning the short-term purchase of a physical structure or 'cube' to provide effective and scalable hosting for computing and storage cabinets for user laboratories.

Whatever direction pooling computing equipment in the south of the Île-de-France region takes, Thomas Schmitt remains ever-enthusiastic "because advancing together means we'll be able to advance further".

  • 1CNRS / ENS Cachan.
  • 2CNRS / École Centrale Paris.
  • 3A mesocentre is a scientific computing centre that provides computing resources on the intermediate scale between laboratory machines and those of the national data centres.
  • 4CentraleSupélec / ENS Paris-Saclay / CNRS.
  • 5The PUE is an indicator that measures a data centre's energy efficiency. This is calculated by dividing the total energy consumed by the data centre by the total energy used by the IT equipment (server, storage, network). The closer to 1, the more efficient the energy consumption. However, this calculation does not take heat recovery into account.
  • 6HPC computing consists of aggregating computational power to obtain greater performance levels than traditional computers and servers.
250 unique active Ruche users per month in 2024 and 600 unique active users for the whole year which is four times more than at launch in 2020
Around thirty partner laboratories are involved – three times more than in 2020
More than 10,000 Central Processing Unit cores on Ruche and around 15,000 more on Cloud@VirtualData
Up to 1.5 TB of RAM for large memory calculations
23 Graphics Processing Unit nodes totalling over 80 Nvidia A100 and V100 graphics cards on Ruche