Write Better GPU Applications and Save Time
With TotalView, developers can quickly and easily debug CUDA and OpenACC code for better performing GPU applications.
TotalView Supports:
Latest NVIDIA CUDA releases on their latest GPUs for Linux x86-64, Linux PowerLE (Power8/Power9), and ARM64 platforms.
Latest AMD ROCm/HIP support on their latest GPUs for Linux x86-64.
NVIDIA and Cray OpenACC debugging support.
Debug embedded code on NVIDIA Jetson AGX Xavier GPUs.
“Arm strives to enable highly integrated, energy-efficiency solutions. With TotalView, customers using ARM platforms have a robust, scalable dynamic analysis solution for their complex HPC clusters and code.”
Eric Van Hensbergen | Director of HPC | Arm
TotalView and CUDA
With built-in capabilities designed specifically for use with CUDA, you can use TotalView to:
Gain visibility into Linux and GPU device threads.
Have full visibility into hierarchical device, block, and thread memory.
Navigate device threads by logical and device coordinates.
Handle CUDA functions inline and on stacks.
Use command line interface commands for CUDA functions.
Debug host and device code in the same session.
Debug applications that use multiple NVIDIA devices at the same time.
Debug MPI applications on CUDA-accelerated clusters.
Take advantage of Unified Virtual Addressing and GPUDirect.
Use CUDA C++ and inline PTX.
Report memory errors and handle CUDA exceptions.
Debug CUDA dynamic mode programs and CUDA core files.
TotalView and ROCm/HIP
Debug applications build with the ROCm software stack and HIP for AMD GPUs:
- Debug HIP (Heterogeneous Interface for Portability) code running on AMD and NVIDIA GPUs.
- Easily debug CPU and AMD GPU code in one session.ave full visibility into hierarchical device, block, and thread memory.
- Control GPU execution at the GPU level with Agent Threads and a more granular wave level with Wave-Thread debugging.
- Fast smart-stepping for efficient debugging of GPU code.
- Navigate device threads by logical and device coordinates.
- Data watchpoints on global memory variables.
- Debug applications that use multiple AMD devices at the same time.
- Debug MPI applications on AMD GPU accelerated clusters.
- Gain visibility into Linux and GPU device threads.
- Have full visibility into hierarchical device, block, and thread memory.
Image

TotalView and ARM
TotalView supports current 64-bit ARMv8-A CPUs, so you can take advantage of the architecture's performance and energy savings.
Image

Debugging CUDA-Accelerated Parallel Applications With TotalView
Learn about CUDA concepts, the impact of those concepts for troubleshooting CUDA, and how TotalView debugger can help.