The Swiss National Supercomputing CenterAlso Known as CSCs, Built and Deployed a new supercomputer in collaboration with Nvidia and HPE.
The machine, called alps, came on line at the end of 2024, and is alredy Listed as the world's Seventh Most Powerful Supercomputer – And Europe's second most powerful. Computer weekly sat down with Thomas schultessDirector of CSCS and Professor of Computational Physics at Eth [Eidgenössische Technische Hochschule or Federal Institute of Technology] Zurich, to find out more.
What is the history of alps and what architectural decisions did you make along the way?
Thomas schultess: I'll start by explaining the differentce between cscs and alps. CSCS is a center with people. The main facility is in Lugano, Near the football stadium and the ice hockey stadium. It was founded in 1991, long before I Arrived, and it's where we deploy and operate supercomputers, the biggest of which is alps, which came online in 2024. Puters.
For example, we had Piz dantA Hybrid Cray XC40/XC50 machine, which was the first gpu-based supercomputer in europe. We are deployed it Around 2012 to 2013, which was there the time of Jaguar In oak ridge national laboratory in the us.
One of the things that makes us special is that we design, build and operate supercomputers for MeteoswissThe Swiss Meteorological Service. Normally, weather services run their own computers, but in our case, we do it for them. As a result, we have had a strong collaboration with meteoswiss for decades.
Alps is an effort to brings different computers INTO One Platform – And it was motivated by a peer review of the center that we have had in 2015, where we get the verry strong message telling us we have done but now we Must face the challenges of data and complex workflows in scientific computing.
That's when we should look for options for how to evolve supercomputing. And what came out as a collaboration with was then cray, and now hpe, who acqured cray in 2019. OUD-NATIVE Architecture. For us, this was a really good development, but it turned out to be very Difacity, much more Diffort than anybody predicted.
But we decide to go this way Around 2018 to 2019. We then Considered Competing Architectures – Nvidia Versus Amd – and in the end, we want for both. We did the scale out with grace hopper [from Nvidia]And now we also have a significant partition of mi300a accelerators [from AMD] on alps.
And how alps is running today?
Schults: The way alps works today is a very large Slingshot Network, Like Frontier and Lumi – And we can partition the network. At the end of every network endpoint is either a storage device or a computer node. And the Compute Nodes are eater grace hopper (GH200) -Based or AMD-Mi300A based. We also have nvidia A100 and AMD Mi250x Processors, which makes the node the same as in lumi and in frontier. We have amd rome-based nodes as well, so a traditional multicore partition.
Hence, we support a multitude of computer architectures on alps. The idea there is that we can serve different workloads. And we have a big focus on application software development. So, we can make all these kinds of architectures available to software developers. And that's where we are today.
How do you offer service on alps?
Schults: You can view alps like a cloud-like experience, with different types of service. We can offer infrastructure as a service (IAAS). Typically, we offer IAAS to other research infrastructures, like for the Paul Scherrer Institute That Runs Several Large User Programmes, Including Access to a Synchrotron [the Swiss Light Source]The Free Electron Laser [SwissFEL]and the Swiss Spallation Neutron Facility to Study Muon Sciences. And so they get a partition on alps and they run their own platforms on it.
In other cases, we might create a platform for ai or traditional hpc or climate and weather for users. And then we have users or communities that run their own function as a service, and we provide them with a platform as a service. We are also involved with large experiences like the Square kilometer array Or the swiss tier two for lhc data analysis that is part of the world Lhc Compute Grid, which is a partition on alps.
And probally the most important thing now is that where we used to have a separete computer for meteoswiss, with the new model, we Run his numerical foresting system Icon in a partition on alps.
It seems that the fact that Icon is now running in a partition is a good indication of the size of alps?
Schults: Well, it shows you the size, but also the breadth that we can cover. Traditionally, a supercomputer is a unique system. It may be heteroideneous-for example, piz daint is heterogeneous in that it has multicore nodes, gpu-accelerated nodes. It may be heteroideneous, but it was architected as a uniform system in that it's a one-size-fits-all solution, in terms of the programming environment and things like.
Typically, users have to adapt to a particular supercomputer. So, you basically have a hammer and you need to make everything look like a nail. Now on alps, we can create partitions and the software environment in there partitions to adapt to users.
Who Funds CSCS and Alps?
Schults: Alps as a research infrastructure is funded by the eth domain. CSCS is a unit of Eth zurichWhere I am also a professor of physics. Eth zurich and EPFL, The Sister School in Lausanne, and Four National Labs are Joined Togetra Under What is called the eth domain.
The whole domain is funded by the state secret for education, research and innovation – that our main funding source. But the meteoswiss part is funded by meteoswiss and whitever their funding sources are. So, we have to maintain a clear Separation there. And also have third-party funding, like most research infrastructures, in the range of Around 20%.
Because we are a public funded infrastructure, even if we work with other third parties and get full cost recovery, we are still subsidized, and subsidies don’t scale. We cannot have commercial activities on our infrastructure, Thought we can engage in research collaboration with commercial companies. And when we do collaborate with companies, they must fund the recovery costs of that that.
What about your involvement in the Openchami Consortium?
Schults: The OpenChami Consortium Currently Includes Five Partners: Los Alamos National Laboratory, Nersc [National Energy Research Scientific Computing Center], Lawrence Berkeley National Laboratory, University of BristolHPE, and cscs.
The consortium is developing the system management infrastructure of the future. ALPS is an essential use case in this development. So, that's why the system management software will continue to evolve over the next two or three years – here at cscs, but also in bristol, in bristol, in los alamos, and in berkeley.