HKUST ITSO AI Chatbot

Important reminder

Please do not disclose personal data such as your HKUST account number, staff/student ID or name in the chatbot. Information provided will be retained to enhance system performance.

By using the HKUST ITSO Chat service, you confirm that you have read, understood, and agreed to the Disclaimer


Log in to access additional information for your user group in addition to the publicly accessible content.

Send Icon
Slurm Partition and Resource Quota

Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs. The available partitions are normal, preempt, and cpu. You can run sinfo to find the available list of partitions in discovery.

Resource Request Policy

  • Computational resource in HKUST SuperPOD is requested in units of H800 (80GB) GPU that each GPU is associated with the default CPU cores and system memory in Slurm as below:
    • 14 CPU cores with 28 Threads
    • 224GB system memory
  • In general, we recommend that users just specify --gpus-per-node for number of GPUs per node and --nodes for number of nodes in a job request, and let Slurm allocate the cores and memory among the nodes for the optimized resource utilization.
  • For normal partition, it supports job requests for GPU computation that varies from 1 GPU to to 96 GPUs (equivalent to 12 nodes).   Jobs requested with multiple GPUs will be assigned a higher priority to run. 
  • For preempt partition, jobs are run on idle resources on reserved nodes.  When idle resources are available, jobs submitted to this partition will have a 15-minute execution window. After that window, jobs may be terminated without prior notification if the reserved resources are reclaimed.
  • Please refer to "Charging for Use of HKUST SuperPOD" for detailed information regarding usage charges for these partitions.

 

Partition Table

Slurm Partition normal preempt cpu

No. of nodes

Nodes that are not reserved*

Idle resources on reserved nodes

2 Intel nodes

Purpose

Mainstream GPU computation

Utilization of idle resources on reserved nodes

Data pre-processing for GPU computation

Max Wall Time

3 days

3 days

12 hours

Min resource requested per job

1 GPU

1 GPU

1 CPU core

Max resource requested per account

96 GPUs (or equivalent to 12 nodes)

96 GPUs (or equivalent to 12 nodes)

8 CPU cores (per job)

Concurrent running jobs quota per user

8

8

28

Queuing and running jobs limit per user

10

10

28

Chargeable

Yes

Yes

No

Interactive job

Maximum 8 hours wall time

Maximum 8 hours wall time

Not Allow

Remarks

Jobs requested with more GPUs will be assigned a higher priority.

Jobs submitted to this partition have 15 minutes run time when idle resources on reserved nodes are available.  After this period, jobs may be cancelled when reserved resources are reclaimed.

No access to the /scratch directory 

 

Notes:

  • The number of available nodes can be obtained by using sinfo

Support

General Enquiries cchelp@ust.hk
Suggestions & Complaints cclisten@ust.hk
Serviceline +852-2358-6200