What is the canonical way to check for errors using the CUDA runtime API

Question

Looking through the answers and comments on CUDA questions  and in the CUDA tag wiki  I see it is often suggested that the return status of every API call should checked for errors  The API documentation contains functions like cudaGetLastError  cudaPeekAtLastError  and cudaGetErrorString  but what is the best way to put these together to reliably catch and report errors without requiring lots of extra code

User · Answer

The solution discussed here worked well for me. This solution uses built-in cuda functions and is very simple to implement.

The relevant code is copied below:

#include <stdio.h>
#include <stdlib.h>

__global__ void foo(int *ptr)
{
  *ptr = 7;
}

int main(void)
{
  foo<<<1,1>>>(0);

  // make the host block until the device is finished with foo
  cudaDeviceSynchronize();

  // check for error
  cudaError_t error = cudaGetLastError();
  if(error != cudaSuccess)
  {
    // print the CUDA error message and exit
    printf("CUDA error: %s\n", cudaGetErrorString(error));
    exit(-1);
  }

  return 0;
}

User · Answer

talonmies  answer above is a fine way to abort an application in an assert-style manner   Occasionally we may wish to report and recover from an error condition in a C   context as part of a larger application   Here s a reasonably terse way to do that by throwing a C   exception derived from std  runtime error using thrust  system error    include  lt thrust system error h gt   include  lt thrust system cuda error h gt   include  lt sstream gt   void throw on cuda error cudaError t code  const char  file  int line      if code    cudaSuccess          std  stringstream ss      ss  lt  lt  file  lt  lt       lt  lt  line  lt  lt           std  string file and line      ss  gt  gt  file and line      throw thrust  system error code  thrust  cuda category    file and line           This will incorporate the filename  line number  and an English language description of the cudaError t into the thrown exception s  what   member    include  lt iostream gt   int main       try            do something crazy     throw on cuda error cudaSetDevice -1     FILE      LINE           catch thrust  system error  amp e          std  cerr  lt  lt   CUDA error after cudaSetDevice     lt  lt  e what    lt  lt  std  endl          oops  recover     cudaSetDevice 0          return 0      The output     nvcc exception cu -run CUDA error after cudaSetDevice  exception cu 23   invalid device ordinal   A client of some function can distinguish CUDA errors from other kinds of errors if desired   try        call some function which may throw something   some function      catch thrust  system error  amp e      std  cerr  lt  lt   CUDA error during some function     lt  lt  e what    lt  lt  std  endl    catch std  bad alloc  amp e      std  cerr  lt  lt   Bad memory allocation during some function     lt  lt  e what    lt  lt  std  endl    catch std  runtime error  amp e      std  cerr  lt  lt   Runtime error during some function     lt  lt  e what    lt  lt  std  endl    catch          std  cerr  lt  lt   Some other kind of error during some function   lt  lt  std  endl        no idea what to do  so just rethrow the exception   throw      Because thrust  system error is a std  runtime error  we can alternatively handle it in the same manner of a broad class of errors if we don t require the precision of the previous example   try        call some function which may throw something   some function      catch std  runtime error  amp e      std  cerr  lt  lt   Runtime error during some function     lt  lt  e what    lt  lt  std  endl

User · Answer

Probably the best way to check for errors in runtime API code is to define an assert style handler function and wrapper macro like this    define gpuErrchk ans    gpuAssert  ans     FILE      LINE       inline void gpuAssert cudaError t code  const char  file  int line  bool abort true       if  code    cudaSuccess              fprintf stderr  GPUassert   s  s  d n   cudaGetErrorString code   file  line         if  abort  exit code            You can then wrap each API call with the gpuErrchk macro  which will process the return status of the API call it wraps  for example   gpuErrchk  cudaMalloc  amp a d  size sizeof int        If there is an error in a call  a textual message describing the error and the file and line in your code where the error occurred will be emitted to stderr and the application will exit  You could conceivably modify gpuAssert to raise an exception rather than call exit   in a more sophisticated application if it were required   A second related question is how to check for errors in kernel launches  which can t be directly wrapped in a macro call like standard runtime API calls  For kernels  something like this   kernel lt  lt  lt 1 1 gt  gt  gt  a   gpuErrchk  cudaPeekAtLastError      gpuErrchk  cudaDeviceSynchronize        will firstly check for invalid launch argument  then force the host to wait until the kernel stops and checks for an execution error  The synchronisation can be eliminated if you have a subsequent blocking API call like this   kernel lt  lt  lt 1 1 gt  gt  gt  a d   gpuErrchk  cudaPeekAtLastError      gpuErrchk  cudaMemcpy a h  a d  size   sizeof int   cudaMemcpyDeviceToHost       in which case the cudaMemcpy call can return either errors which occurred during the kernel execution or those from the memory copy itself  This can be confusing for the beginner  and I would recommend using explicit synchronisation after a kernel launch during debugging to make it easier to understand where problems might be arising    Note that when using CUDA Dynamic Parallelism  a very similar methodology can and should be applied to any usage of the CUDA runtime API in device kernels  as well as after any device kernel launches    include  lt assert h gt   define cdpErrchk ans    cdpAssert  ans     FILE      LINE         device   void cdpAssert cudaError t code  const char  file  int line  bool abort true       if  code    cudaSuccess             printf  GPU kernel assert   s  s  d n   cudaGetErrorString code   file  line         if  abort  assert 0

User · Answer

The C  -canonical way  Don t check for errors   use the C   bindings which throw exceptions   I used to be irked by this problem  and I used to have a macro-cum-wrapper-function solution just like in Talonmies and Jared s answers  but  honestly  It makes using the CUDA Runtime API even more ugly and C-like   So I ve approached this in a different and more fundamental way  For a sample of the result  here s part of the CUDA vectorAdd sample - with complete error checking of every runtime API call           prepare host-side buffers here       auto current device   cuda  device  current  get    auto d A   cuda  memory  device  make unique lt float   gt  current device  numElements   auto d B   cuda  memory  device  make unique lt float   gt  current device  numElements   auto d C   cuda  memory  device  make unique lt float   gt  current device  numElements    cuda  memory  copy d A get    h A get    size   cuda  memory  copy d B get    h B get    size            prepare a launch configuration here       cuda  launch vectorAdd  launch config      d A get    d B get    d C get    numElements        cuda  memory  copy h C get    d C get    size            verify results here       Again - all potential errors are checked   and an exception if an error occurred  caveat  If the kernel caused some error after launch  it will be caught after the attempt to copy the result  not before  to ensure the kernel was successful you would need to check for error between the launch and the copy with a cuda  outstanding error  ensure none   command    The code above uses my  Thin Modern-C   wrappers for the CUDA Runtime API library  Github   Note that the exceptions carry both a string explanation and the CUDA runtime API status code after the failing call   A few links to how CUDA errors are automagically checked with these wrappers    A test program throwing and catching a bunch of exceptions Documentation for the error-related functionality

[cuda] What is the canonical way to check for errors using the CUDA runtime API?

Examples related to cuda

Examples related to error-checking