Norwegian version of this page

High Performance Computing

IMV owns a several GPU-equipped workstations for computationally intensive tasks, especially those that can benefit from GPU parallel computing, such as deep learning frameworks and video processing.

table, machines, screens

HPC machines

Technical specification of workstations

Machine Name CPU Memory GPU Disk Max Power OS
ML1

Intel Core i9-14900KF - 24 Cores

192GB DDR5 5200MHz GeForce RTX 4090 - 24GB M.2 NVMe SSD 2TB 1000W Ubuntu 22.04.4 LTS
ML2 Intel Core i9-14900KF - 24 Cores 192GB DDR5 5200MHz GeForce RTX 4090 - 24GB M.2 NVMe SSD 2TB 1000W Ubuntu 22.04.4 LTS
ML3 Intel Core i9-14900KF - 24 Cores 192GB DDR5 5200MHz GeForce RTX 4090 - 24GB M.2 NVMe SSD 2TB 1000W Ubuntu 22.04.4 LTS
ML4 AMD Ryzen 9 5900X - 12 Cores 94GB DDR4 3200MHz GeForce RTX 3090  - 24GB  M.2 NVMe SSD 2TB & 7200 RPM HDD 2TB 850W Ubuntu 22.04.4 LTS
ML5 Intel Core i7-11700KF - 8 Cores 64GB DDR4 3200MHz GeForce RTX 3070 Ti - 8GB  M.2 NVMe SSD 1TB 850W Ubuntu 22.04.4 LTS

 

Access

To get access to the machines, send a request via email to the Lab Responsible. Access is granted for a limited time period (renewable).

Machines can be accessed and used remotely via SSH (terminal command line), and files can be transferred using SFTP.

Users can create dedicated Python environments using venv, and packages can be installed using pip. Packages can be installed only when working on the machine locally.

GPU performance can be monitored using nvtop.

Only green data can be used on these workstations.

 

GPU Memory Management

When multiple users are running Tensorflow 2 Python scripts on the same machine, or when a single user is running multiple scripts, GPU memory must be managed explicitly. By default, Tensorflow scripts pre-allocate and occupy the entire GPU memory, even if only a small fraction is needed. Memory is released when the script ends, preventing other scripts from running (they will fail for lack of resources). In such cases there are two options:

  • Disable GPU memory pre-allocation using the following code at the beginning of the script. A script may still fail if the cumulative memory requirement exceed the total available resources.
gpu = tf.config.experimental.list_physical_devices('GPU')[0]

tf.config.experimental.set_memory_growth(gpu, True)
  • Set a limit for the GPU memory pre-allocation using the following code at the beginning of the script. The limit is expressed in MB. With this approach you need to know the GPU memory requirements of the script, which can be profiled using the previous option and monitoring the performance with nvtop. Also in this case a script may still fail if the cumulative memory allocation exceed the total available resources, which is unlikely to happen if users coordinate and agree on the share of memory they will use.
gpu = tf.config.experimental.list_physical_devices('GPU')[0]

tf.config.experimental.set_virtual_device_configuration(gpu, [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)])

 

Published May 19, 2024 10:17 AM - Last modified June 14, 2024 4:49 PM