Accounting
The Cluster account is organized with the principal investigator (PI) or the group leader of a research team. Each member of the team has an individual user account under PI’s group to access the cluster and run jobs on the partitions (queues) with SLURM. With this accounting scheme, the system can impose resource limits (usage quota) on different partitions for different groups of users.
Resource limits
Compute node has processors, memory, swap and local disk as resources. Our cluster resource allocation is based on CPU core only. In particular, no core can run more than one job in a partition at a time. In case one needs to use a number of nodes exclusively for a job, user can specify exclusive option in the slurm script. The resource limits on partitions are imposed on PI group as a whole. This implies that individual users in the same group share the quota limit.
Partitions
The ownership of HPC2 compute nodes are diversified. The partitions and their resource limits are summarized as follows.
Partition | No. of Nodes | CPU | Memory | Coprocessor |
standard | 20 | 2 x Intel Xeon E5-2670 v3 (12-core) | 64G DDR4-2133 | – |
himem | 15 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | – |
gpu | 5 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | 2 x Nvidia Tesla K80 |
ssci | 15 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | – |
cbme | 1 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | 2 x Nvidia Tesla K80 |
ce | 2 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | – |
ch | 10 | 2 x Intel Xeon E5-2670 v3 (12-core) | 64G DDR4-2133 | – |
ch1 | 1 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | – |
cse | 1 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | 2 x Nvidia Tesla K80 |
ece | 1 | 2 x Intel Xeon E5-2683 v4 (16-core) | 128G DDR4-2400 | – |
ias | 6 | 2 x Intel Xeon E5-2670 v3 (12-core) | 256G DDR4-2133 | – |
lifs | 8 | 2 x Intel Xeon E5-2650 v4 (12-core) | 128G DDR4-2400 | – |
ph | 3 | 2 x Intel Xeon E5-2670 v3 (12-core) | 128G DDR4-2133 | – |
sbm | 4 | 2 x Intel Xeon E5-2683 v4 (16-core) | 128G DDR4-2400 | – |
Partition | No. of Nodes | Access (SSCI/SENG) | GrpJobs (Max) | GrpNodes (Max) | GrpSubmit Jobs (Max) | MaxWallTime |
standard | 20 | Both | 4 | 5 | 4 | 3 days |
himem | 15 | Both | 3 | 5 | 3 | 3 days |
gpu | 5 | GPU user | 2 | 2 | 2 | 3 days |
ssci | 15 | SSCI only | 3 | 5 | 3 | 3 days |
Partition | No. of Nodes | MaxCPUs/User | MaxJobs/User | MaxSubmit/User(Max) | MaxWallTime |
cbme | 1 | 24 | 10 | 10 | 60 days |
ce | 2 | 48 | 4 | 4 | 20 days |
ch | 10 | 96 | 3 | 4 | 7 days |
ch1 | 1 | 24 | 1 | 3 | 7 days |
cse | 1 | -- | -- | -- | -- |
ece | 1 | 32 | 10 | 10 | 30 days |
ias | 6 | 144 | 10 | 50 | 15 days |
lifs | 8 | 192 | 8 | 10 | 5 days |
ph | 3 | 72 | 72 | 108 | 30 days |
sbm | 4 | 128 | 4 | 8 | 15 days |
For the quota terminology, please refer here.
Job Scheduling
Currently the SLURM jobs are scheduled with basic priority, i.e. first in fist out depending on the order of arrival.
Community Cluster
In order to maximize the usage of computational resources, ITSO has configured a community cluster strategy such that idle resources on the HPC2 cluster can be used by anybody. The community cluster can be accessed via the partition “general”. Jobs submitted on this partition are scheduled ONLY when there is idle resources and the maximum wall-time is 12 hours. Usage of this community cluster is open to all users. The usage quota is summarized as follows.
Partition | GrpJobs (Max) | GrpNodes (Max) | GrpSubmitJobs (Max) | MaxWallTime |
general | 2 | 6 | 2 | 12 hours |
Disk Quota
The disk quota for each SSCI and SENG PI group in hpc2 is 2 TB, and for other PI group is 500 GB. The quota is shared among all members of the group. Usage exceeding the quota have 24-hour grace period to clean up the extra data. The total disk space available in the cluster is 340TB.
To check the disk usage and quota of your group:
lfs quota -h -g <your_group> /home
Group Share Directory
A share directory is assigned to each PI group. Users from the same group can access, create and modify files in the share directory.
To access the share directory:
cd /home/share/<your_group>
or
cd $PI_HOME
Note that the group disk quota is also applied to the share directory.
Backup
There is NO backup service on the cluster and user is required to manage the backup of the data themselves.
Scratch Files
There is about 900GB with the /tmp of the compute nodes for local scratch files. User is advised to make use of it and clear them up as soon as you are finished with your application. The files in the /tmp of all nodes will be removed by the system automatically if they are not accessed for more than 10 days.