The system supports 2 types of user accounts: namely project-based and individual students with approved UROP
- Project-based accounts
- Allow to access more computational resources, with allocation granted during project approval
- Computational resources are shared among all group members of the project
- Usage accounting for computational resource is implemented, details to be announced later
- Provide shared storage space for the group
- Individual student accounts
- Computational resources are allocated to each student individually
- No usage accounting
Resource Request
- Resource request is counted by GPU Resource Unit (GRU). Each GRU associated with different maximum CPU cores and system memory in slurm partitions.
- For the project & large-project partitions, 1 GRU corresponds to
- One H800 GPU with 80GB GPU memory
- 14 CPU cores with 28 Threads
- 224GB system memory
- For the student partition, each H800 GPU is partitioned into different size of GPU instances using Nvidia MIG technology, with 1 GRU corresponds to either 3g.40gb, 4g.40gb or 7g.80gb MIG device
- For 3g.40gb, 1 GRU is
- 3/7 of one H800 GPU system computational power with 40GB GPU memory
- For 4g.40gb, 1 GRU is
- 4/7 of one H800 GPU system computational power with 40GB GPU memory
- For 7g.40gb, 1 GRU is
- equivalent to whole H800 GPU of computational power and memory
- 8 CPU cores with 16 Threads
- 160GB system memory
- For 3g.40gb, 1 GRU is
-
For the debug partition, 1 GRU corresponds to
-
One H800 GPU with 80GB GPU memory
-
14 CPU cores with 28 Threads
-
224GB system memory
-
Partition Table
Slurm Partition | project & large-project | student | debug | cpu |
---|---|---|---|---|
Slurm Partition
No. of DGX nodes |
project & large-project
52 |
student
2 with GPU MIG partitioned |
debug
1 |
cpu
2 CPU nodes |
Slurm Partition
Who can access |
project & large-project
Project based users only |
student
Non-project based student users only |
debug
All |
cpu
Project based users only |
Slurm Partition
Purpose |
project & large-project
Computation |
student
Computation |
debug
Compile, build container, interactive debug, code profiling |
cpu
Data pre-processing for GPU computation |
Slurm Partition
Max Wall Time |
project & large-project
3 days |
student
1 day |
debug
2 hours |
cpu
12 hours |
Slurm Partition
Max resource requested |
project & large-project
Varies with projects, |
student
1 GRU |
debug
1 GRU |
cpu
8 CPU cores (per job) |
Slurm Partition
Concurrent running jobs quota per user |
project & large-project
8 |
student
1 |
debug
1 |
cpu
28 |
Slurm Partition
Queuing and running jobs limit per user |
project & large-project
10 |
student
2 |
debug
1 |
cpu
28 |
Slurm Partition
Usage Accounting |
project & large-project
Yes |
student
No |
debug
No |
cpu
No |
Slurm Partition
Job Preemption |
project & large-project
In large-project partition, jobs from approved projects can preempt other jobs that can run for at least 2 hours before getting preempted |
student
No |
debug
No |
cpu
No |
Slurm Partition
Remarks |
project & large-project
Resources quota are per-project unless specified |
student
Resources quota are per-user instead of per project |
debug
Resources quota are per-user instead of per project |
cpu
Resources quota are per-project unless specified No access to the /scratch directory |