The HPC4 is the fourth-generation high-performance computing cluster implemented and maintained by ITSO. It is officially rolled out in mid-Oct 2024 after a pilot testing period. Hosted in the new High Performance Infrastructural Center (HPCIC), the HPC4 equipment is primarily funded by the University, while also accepting contributions from faculty members. The HPC4 platform is designed to support scientific computations with Intel-based and AMD-based CPU nodes, and it also features Nvidia-based GPU machines for tasks that do not require hardware specific for AI centric workloads like HKUST SuperPOD.
Quick Links |
What's New
(Video Recording -Presentation :Slides Demo : Login, Slurm, Interactive mode, Spack) |
HPC4 Highlights
The HPCIC adopts Liquid Immersion Cooling Technology, which offers the following advantages:
• Allows higher density of computing resources resulting in more hardware in less physical space.
• Offers high energy efficiency for sustainability
• Operating cost reduction from energy saving in cooling of machines
The system environment of HPC4 is based on the latest version of Rocky Linux 9. Secure computing approach would be adopted, with regular operating system upgrades and security patches to be applied to maintain a safe environment for research workloads.
The software environment promotes a Do-It-Yourself installation approach using the Spack tool (https://spack.readthedocs.io/), providing users with the flexibility to customize their software environment to meet their research needs. Additionally, the use of Apptainer (formerly known as Singularity) is supported and encouraged to handle different software packaging and compatibility issues.
HPC4 Hardware Specification
HPC4 cluster is composed of both CPU and GPU nodes:
CPU Nodes
Processor | No. of Nodes | CPU Cores per Node |
Memory per Node |
Max Instruction Set |
---|---|---|---|---|
1.9 GHz Intel Emerald Rapids | 10 | 128 | 512 GB | AVX-512 |
2.25 GHz AMD Bergamo | 76 | 256 | 768 GB | AVX-512 |
2.25 GHz AMD Bergamo | 16* | 256 | 1.5 TB | AVX-512 |
* Contributed servers to be available soon
GPU Nodes
Processor | No. of Nodes | CPU Cores per Node |
Memory per Node |
Max Instruction Set |
GPUs per Node |
---|---|---|---|---|---|
2.1 GHz Intel Sapphire Rapids | 15 | 64 | 512 GB | AVX-512 | 4 (A30, 24 GB) |
2.5 GHz Intel Emerald Rapids | 6 | 64 | 512 GB | AVX-512 | 4 (L20, 48 GB) |
Performance Comparison of HPC4 CPUs and GPUs
To give user an idea of the performance of CPUs and GPUs in HPC4 with others, please check here
HPC4 Software
Modules
Lmod is used to manage installations for most application software. With the modules system, user can set up the shell environment to give access to applications and make running and compiling software easier. It also allows us to run multiple versions of the same software that co-exist in the system with abstraction of version and high dependencies of the OS.
Click here for details of the module system.
Spack - User-managed software installation manager
Spack is a package manager that enables users to build software from source code or to install pre-compiled binary packages directly into their computing environments. With Spack, users can easily manage the installation, configuration, and dependencies of a wide variety of scientific and high-performance computing software packages.
Click here for details of Spack.
Use of Apptainer (Singularity)
Apptainer (formerly known as Singularity) container lets user run applications in a Linux environment of their choice. It encapsulates the operating system and the application stack into a single image file. One can modify, copy and transfer this file to any system has Apptainer installed and run as a user application by integrating the system native resources such as infiniband network, GPU/accelerators, and resource manager with the container. Apptainer literally enables BYOE (Bring-Your-Own-Environment) computing in the multi-tenant and shared HPC cluster.
Click here to view details of Apptainer (Singularity)
HPC4 Charging Model
Charging for HPC4 services can be justified for several reasons. It ensures efficient resource allocation by prioritizing projects with significant needs and potential impacts, while also helping to recover the substantial costs associated with hardware, maintenance, and energy consumption. This approach supports the sustainability of the service by funding ongoing maintenance and necessary upgrades to keep the infrastructure current and reliable. Additionally, charging encourages accountability and fair usage among researchers and promotes optimization. It also incentivizes researchers to seek external funding, aligning their projects with available grants and enhancing the overall research output of HKUST.
Click here for details on the charging for use of HPC4.
HPC4 Contribution Model
Adopting the community cluster model for HPC4 cluster offers several benefits. It enables resource pooling and sharing which reduce costs for individual research team access to advanced computational resources that might otherwise be more expensive or unattainable.
Subject to HPCIC resource availability, the HPC4 adopts community cluster model similar to HPC3 and accepts hardware contribution from faculty members on regular basis. Details of the HPC4 contribution model are available here.
HPC4 Account Application
All HKUST faculty members are eligible to apply for a HPC4 account. To apply, please complete the HPC4 Account Application Form. Students who wish to utilize HPC4 should consult their supervisors to support their applications by completing the above application form.
Getting Started
How to login to the cluster
Click here to view the instructions on how to get access to the HKUST HPC4 cluster
Use of SLURM Job Scheduling System
The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM.
Click here to learn how to submit your first SLURM job
Click here to view details of using SLURM
Partition and Resource Quota
Click here to view more information on partition and resource quota.
Storage Types
Click here to view more information on different storage types.
Job Priority and Accounting
Not enforced in the pilot stage. Further information will be available in due course.