Updates in 2025.3
General
Added support for CUDA 13.0. See the tool’s CUDA driver system requirements.
Added or improved support for Blackwell chips.
For Green Context launches,
launch__waves_per_multiprocessoris now scaled to the number of SMs in the Green Context.Added support for profiling individual nodes of device-launchable CUDA graphs launched from the host.
Added metric
launch__persisting_l2_cache_sizeto the Memory Workload Analysis section.Removed metric
profiler__pmsampler_dropped_samples.Added support for not importing SASS cubins into the report.
NVIDIA Nsight Compute
The Source page now shows the instruction category in SASS and the instruction mix for high-level source.
Added a new instruction mix and scoreboard dependencies table to the Source page.
Added improved tooltips to the memory chart.
Added information on the GPC Constant Cache (GCC) and DSMEM atomics in the memory tables.
The Metric Details tool window now shows the breakdown for throughput metrics.
Added support for searching web forum and rule results.
Multiple results from the same search source are now combined to make the output more readable.
Improved the occupancy calculator.
NVIDIA Nsight Compute CLI
Added the option –forward-signals to transparently forward signals to the profiled application.
Resolved Issues
Fixed that some
ncuconsole messages were truncated after 1024 characters.Fixed some display issues related to Green Context tables.
Improved the performance of remote profiling in application replay mode.
Fixed a hang in certain scenarios when profiling dependent kernels with device-mapped host allocations.
Fixed missing correlation between JIT-compiled PTX to SASS in some situations.
Fixed an error when profiling a CUDA graph kernel node doing a cluster launch on driver 580 or newer.