Norwegian version of this page

High Performance Computing

IMV owns a several GPU-equipped workstations for computationally intensive tasks, especially those that can benefit from GPU parallel computing, such as deep learning frameworks and video processing.

HPC machines

Technical specification of workstations

Machine Name	CPU	Memory	GPU	Disk	Max Power	OS
ML1	Intel Core i9-14900KF - 24 Cores	192GB DDR5 5200MHz	GeForce RTX 4090 - 24GB	M.2 NVMe SSD 2TB	1000W	Ubuntu 22.04.4 LTS
ML2	Intel Core i9-14900KF - 24 Cores	192GB DDR5 5200MHz	GeForce RTX 4090 - 24GB	M.2 NVMe SSD 2TB	1000W	Ubuntu 22.04.4 LTS
ML3	Intel Core i9-14900KF - 24 Cores	192GB DDR5 5200MHz	GeForce RTX 4090 - 24GB	M.2 NVMe SSD 2TB	1000W	Ubuntu 22.04.4 LTS
ML4	AMD Ryzen 9 5900X - 12 Cores	94GB DDR4 3200MHz	GeForce RTX 3090 - 24GB	M.2 NVMe SSD 2TB & 7200 RPM HDD 2TB	850W	Ubuntu 22.04.4 LTS
ML5	Intel Core i7-11700KF - 8 Cores	64GB DDR4 3200MHz	GeForce RTX 3070 Ti - 8GB	M.2 NVMe SSD 1TB	850W	Ubuntu 22.04.4 LTS

Access

To get access to the machines, send a request via email to the Lab Responsible. Access is granted for a limited time period (renewable).

Machines can be accessed and used remotely via SSH (terminal command line), and files can be transferred using SFTP.

Users can create dedicated Python environments using venv, and packages can be installed using pip. Packages can be installed only when working on the machine locally.

GPU performance can be monitored using nvtop.

Only green data can be used on these workstations.

GPU Memory Management

When multiple users are running Tensorflow 2 Python scripts on the same machine, or when a single user is running multiple scripts, GPU memory must be managed explicitly. By default, Tensorflow scripts pre-allocate and occupy the entire GPU memory, even if only a small fraction is needed. Memory is released when the script ends, preventing other scripts from running (they will fail for lack of resources). In such cases there are two options:

Disable GPU memory pre-allocation using the following code at the beginning of the script. A script may still fail if the cumulative memory requirement exceed the total available resources.

gpu = tf.config.experimental.list_physical_devices('GPU')[0]

tf.config.experimental.set_memory_growth(gpu, True)

Set a limit for the GPU memory pre-allocation using the following code at the beginning of the script. The limit is expressed in MB. With this approach you need to know the GPU memory requirements of the script, which can be profiled using the previous option and monitoring the performance with nvtop. Also in this case a script may still fail if the cumulative memory allocation exceed the total available resources, which is unlikely to happen if users coordinate and agree on the share of memory they will use.

gpu = tf.config.experimental.list_physical_devices('GPU')[0]

tf.config.experimental.set_virtual_device_configuration(gpu, [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5000)])

Published May 19, 2024 10:17 AM - Last modified June 14, 2024 4:49 PM