Understanding why your GPU-accelerated is slow using NVIDIA Nsight Systems

Apr 15, 2026 | 1:20 PM - 3:00 PM

Description

Understanding performance bottlenecks in CPU-only, single-threaded codes is already non-trivial. Once you add multiple GPUs, asynchronous execution with multiple processes and/or threads, it becomes impossible to guess where time is actually being spent. In this session, we will use NVIDIA Nsight Systems to profile distributed, GPU-accelerated scripts on Meluxina. You will learn how to: - Capture traces for your code/application - Identify CPU, GPU, IO and communication hotspots with nsys-ui and with nsys command line tools - Interpret timelines to spot possible synchronization issues You’ll see cases where the traces make it clear that the hardware isn’t the problem at all: the poor performance comes from our code, not the GPUs or CPUs we’re tempted to blame.

Presented by


Organised by

Supercomputing
EuroCC 2 has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101101903. The JU receives support from the European Union’s Digital Europe Programme and Germany, Bulgaria, Austria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, France, Netherlands, Belgium, Luxembourg, Slovakia, Norway, Türkiye, Republic of North Macedonia, Iceland, Montenegro, Serbia.

Luxembourg AI Factory
This project has received funding from the European High Performance Computing Joint Undertaking (JU) under Grant Agreement No. 101234366. The JU receives support from the European Union’s Horizon Europe research and innovation programme and from Luxembourg, Belgium, Croatia, Greece, Hungary, Ireland, Italy, the Netherlands, Poland, Portugal, Slovenia, and Spain.