The HPC4 is the fourth-generation high-performance computing cluster implemented and maintained by ITSO. It is officially rolled out in mid-Oct 2024 after a pilot testing period. Hosted in the new High Performance Infrastructural Center (HPCIC), the HPC4 equipment is primarily funded by the University, while also accepting contributions from faculty members. The HPC4 platform is designed to support scientific computations with Intel-based and AMD-based CPU nodes, and it also features Nvidia-based GPU machines for tasks that do not require hardware specific for AI centric workloads like HKUST SuperPOD.
HPC4 usage is charged based on a charging model. For details, please visit Charging Model of HPC4
7x24
HPC4 Highlights
The HPCIC adopts Liquid Immersion Cooling Technology, which offers the following advantages:
• Allows higher density of computing resources resulting in more hardware in less physical space.
• Offers high energy efficiency for sustainability
• Operating cost reduction from energy saving in cooling of machines
The system environment of HPC4 is based on the latest version of Rocky Linux 9. Secure computing approach would be adopted, with regular operating system upgrades and security patches to be applied to maintain a safe environment for research workloads.
The software environment promotes a Do-It-Yourself installation approach using the Spack tool (https://spack.readthedocs.io/), providing users with the flexibility to customize their software environment to meet their research needs. Additionally, the use of Apptainer (formerly known as Singularity) is supported and encouraged to handle different software packaging and compatibility issues.
HPC4 Hardware Specification
The HPC4 cluster consists of multiple types of node, each with distinct hardware configurations to cater to a variety of computational needs. The specifications for each type of nodes are detailed below.
CPU Nodes
Processor | Nodes | CPU Cores / Threads (per Node) | System Memory (per Node) |
---|---|---|---|
Intel Xeon Platinum 8592+ | 10 | 128 Cores / 128 Threads | 512GB DDR5-5600 ECC |
AMD EPYC 9754 | 76 + 1* | 256 Cores / 256 Threads | 768GB DDR5-4800 ECC |
AMD EPYC 9754 | 18* + 12^ | 256 Cores / 256 Threads | 1.5TB DDR5-4800 ECC |
GPU Nodes
GPU Model | Host CPU (per Node) | Nodes | GPUs (per Node) | GPU Memory (per GPU) | System Memory (per Node) |
---|---|---|---|---|---|
NVIDIA A30 | 2× Intel Xeon Gold 6448Y | 15 | 4 | 24GB HBM2 ECC | 512GB DDR5-4800 ECC |
NVIDIA L20 | 2× Intel Xeon Gold 6548Y+ | 5 | 4 | 48GB GDDR6 ECC | 512GB DDR5-4800 ECC |
NVIDIA RTX 4090D | 2× Intel Xeon Gold 6448Y | 10* + 2^ | 6 | 24GB GDDR6X | 512GB DDR5-4800 ECC |
NVIDIA RTX 5880 Ada | 2× Intel Xeon Gold 6448Y | 8 | 6 | 48GB GDDR6 ECC | 512GB DDR5-4800 ECC |
* Contributed servers. Not yet available to public use.
^ To be available soon.
Shared Resources
Performance Comparison of HPC4 CPUs and GPUs
To give user an idea of the performance of CPUs and GPUs in HPC4 with others, please check here.
HPC4 Software
Modules
Lmod is used to manage installations for most application software. With the modules system, user can set up the shell environment to give access to applications and make running and compiling software easier. It also allows us to run multiple versions of the same software that co-exist in the system with abstraction of version and high dependencies of the OS.
Click here for details of the module system.
Spack - User-managed software installation manager
Spack is a package manager that enables users to build software from source code or to install pre-compiled binary packages directly into their computing environments. With Spack, users can easily manage the installation, configuration, and dependencies of a wide variety of scientific and high-performance computing software packages.
Click here for details of Spack.
Use of Apptainer (Singularity)
Apptainer (formerly known as Singularity) container lets user run applications in a Linux environment of their choice. It encapsulates the operating system and the application stack into a single image file. One can modify, copy and transfer this file to any system has Apptainer installed and run as a user application by integrating the system native resources such as infiniband network, GPU/accelerators, and resource manager with the container. Apptainer literally enables BYOE (Bring-Your-Own-Environment) computing in the multi-tenant and shared HPC cluster.
Click here to view details of Apptainer (Singularity)
HPC4 Charging Model
Charging for HPC4 services can be justified for several reasons. It ensures efficient resource allocation by prioritizing projects with significant needs and potential impacts, while also helping to recover the substantial costs associated with hardware, maintenance, and energy consumption. This approach supports the sustainability of the service by funding ongoing maintenance and necessary upgrades to keep the infrastructure current and reliable. Additionally, charging encourages accountability and fair usage among researchers and promotes optimization. It also incentivizes researchers to seek external funding, aligning their projects with available grants and enhancing the overall research output of HKUST.
Click here for details on the charging for use of HPC4.
HPC4 Contribution Model
Adopting the community cluster model for HPC4 cluster offers several benefits. It enables resource pooling and sharing which reduce costs for individual research team access to advanced computational resources that might otherwise be more expensive or unattainable.
Subject to HPCIC resource availability, the HPC4 adopts community cluster model similar to HPC3 and accepts hardware contribution from faculty members on regular basis. Details of the HPC4 contribution model are available here.
HPC4 Account Application
All HKUST faculty members are eligible to apply for a HPC4 account. To apply, please complete the HPC4 Account Application Form. Students who wish to utilize HPC4 should consult their supervisors to support their applications by completing the above application form.
Getting Started
How to login to the cluster
Click here to view the instructions on how to get access to the HKUST HPC4 cluster
Use of SLURM Job Scheduling System
The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of the cluster. All jobs in the cluster must be run with the SLURM.
Click here to learn how to submit your first SLURM job
Click here to view details of using SLURM
Partition and Resource Quota
Click here to view more information on partition and resource quota.
Storage Types
Click here to view more information on different storage types.
Job Priority and Accounting
Not enforced in the pilot stage. Further information will be available in due course.