[gpu] Intro to GPU programming

Everyone has this huge massively parallelized supercomputer on their desktop in the form of a graphics card GPU.

  • What is the "hello world" equivalent of the GPU community?
  • What do I do, where do I go, to get started programming the GPU for the major GPU vendors?

-Adam

This question is related to gpu

The answer is


OpenCL is an effort to make a cross-platform library capable of programming code suitable for, among other things, GPUs. It allows one to write the code without knowing what GPU it will run on, thereby making it easier to use some of the GPU's power without targeting several types of GPU specifically. I suspect it's not as performant as native GPU code (or as native as the GPU manufacturers will allow) but the tradeoff can be worth it for some applications.

It's still in its relatively early stages (1.1 as of this answer), but has gained some traction in the industry - for instance it is natively supported on OS X 10.5 and above.


Take a look at the ATI Stream Computing SDK. It is based on BrookGPU developed at Stanford.

In the future all GPU work will be standardized using OpenCL. It's an Apple-sponsored initiative that will be graphics card vendor neutral.


I think the others have answered your second question. As for the first, the "Hello World" of CUDA, I don't think there is a set standard, but personally, I'd recommend a parallel adder (i.e. a programme that sums N integers).

If you look the "reduction" example in the NVIDIA SDK, the superficially simple task can be extended to demonstrate numerous CUDA considerations such as coalesced reads, memory bank conflicts and loop unrolling.

See this presentation for more info:

http://www.gpgpu.org/sc2007/SC07_CUDA_5_Optimization_Harris.pdf


CUDA is an excellent framework to start with. It lets you write GPGPU kernels in C. The compiler will produce GPU microcode from your code and send everything that runs on the CPU to your regular compiler. It is NVIDIA only though and only works on 8-series cards or better. You can check out CUDA zone to see what can be done with it. There are some great demos in the CUDA SDK. The documentation that comes with the SDK is a pretty good starting point for actually writing code. It will walk you through writing a matrix multiplication kernel, which is a great place to begin.


I think the others have answered your second question. As for the first, the "Hello World" of CUDA, I don't think there is a set standard, but personally, I'd recommend a parallel adder (i.e. a programme that sums N integers).

If you look the "reduction" example in the NVIDIA SDK, the superficially simple task can be extended to demonstrate numerous CUDA considerations such as coalesced reads, memory bank conflicts and loop unrolling.

See this presentation for more info:

http://www.gpgpu.org/sc2007/SC07_CUDA_5_Optimization_Harris.pdf


Another easy way to get into GPU programming, without getting into CUDA or OpenCL, is to do it via OpenACC.

OpenACC works like OpenMP, with compiler directives (like #pragma acc kernels) to send work to the GPU. For example, if you have a big loop (only larger ones really benefit):

int i;
float a = 2.0;
float b[10000];
#pragma acc kernels
for (i = 0; i < 10000; ++i) b[i] = 1.0f;
#pragma acc kernels
for (i = 0; i < 10000; ++i) {
  b[i] = b[i] * a;
}

Edit: unfortunately, only the PGI compiler really supports OpenACC right now, for NVIDIA GPU cards.


CUDA is an excellent framework to start with. It lets you write GPGPU kernels in C. The compiler will produce GPU microcode from your code and send everything that runs on the CPU to your regular compiler. It is NVIDIA only though and only works on 8-series cards or better. You can check out CUDA zone to see what can be done with it. There are some great demos in the CUDA SDK. The documentation that comes with the SDK is a pretty good starting point for actually writing code. It will walk you through writing a matrix multiplication kernel, which is a great place to begin.


  1. You get programmable vertex and pixel shaders that allow execution of code directly on the GPU to manipulate the buffers that are to be drawn. These languages (i.e. OpenGL's GL Shader Lang and High Level Shader Lang and DirectX's equivalents ), are C style syntax, and really easy to use. Some examples of HLSL can be found here for XNA game studio and Direct X. I don't have any decent GLSL references, but I'm sure there are a lot around. These shader languages give an immense amount of power to manipulate what gets drawn at a per-vertex or per-pixel level, directly on the graphics card, making things like shadows, lighting, and bloom really easy to implement.
  2. The second thing that comes to mind is using openCL to code for the new lines of general purpose GPU's. I'm not sure how to use this, but my understanding is that openCL gives you the beginnings of being able to access processors on both the graphics card and normal cpu. This is not mainstream technology yet, and seems to be driven by Apple.
  3. CUDA seems to be a hot topic. CUDA is nVidia's way of accessing the GPU power. Here are some intros

Take a look at the ATI Stream Computing SDK. It is based on BrookGPU developed at Stanford.

In the future all GPU work will be standardized using OpenCL. It's an Apple-sponsored initiative that will be graphics card vendor neutral.


  1. You get programmable vertex and pixel shaders that allow execution of code directly on the GPU to manipulate the buffers that are to be drawn. These languages (i.e. OpenGL's GL Shader Lang and High Level Shader Lang and DirectX's equivalents ), are C style syntax, and really easy to use. Some examples of HLSL can be found here for XNA game studio and Direct X. I don't have any decent GLSL references, but I'm sure there are a lot around. These shader languages give an immense amount of power to manipulate what gets drawn at a per-vertex or per-pixel level, directly on the graphics card, making things like shadows, lighting, and bloom really easy to implement.
  2. The second thing that comes to mind is using openCL to code for the new lines of general purpose GPU's. I'm not sure how to use this, but my understanding is that openCL gives you the beginnings of being able to access processors on both the graphics card and normal cpu. This is not mainstream technology yet, and seems to be driven by Apple.
  3. CUDA seems to be a hot topic. CUDA is nVidia's way of accessing the GPU power. Here are some intros

Another easy way to get into GPU programming, without getting into CUDA or OpenCL, is to do it via OpenACC.

OpenACC works like OpenMP, with compiler directives (like #pragma acc kernels) to send work to the GPU. For example, if you have a big loop (only larger ones really benefit):

int i;
float a = 2.0;
float b[10000];
#pragma acc kernels
for (i = 0; i < 10000; ++i) b[i] = 1.0f;
#pragma acc kernels
for (i = 0; i < 10000; ++i) {
  b[i] = b[i] * a;
}

Edit: unfortunately, only the PGI compiler really supports OpenACC right now, for NVIDIA GPU cards.


If you use MATLAB, it becomes pretty simple to use GPU's for technical computing (matrix computations and heavy math/number crunching). I find it useful for uses of GPU cards outside of gaming. Check out the link below:

http://www.mathworks.com/discovery/matlab-gpu.html


I think the others have answered your second question. As for the first, the "Hello World" of CUDA, I don't think there is a set standard, but personally, I'd recommend a parallel adder (i.e. a programme that sums N integers).

If you look the "reduction" example in the NVIDIA SDK, the superficially simple task can be extended to demonstrate numerous CUDA considerations such as coalesced reads, memory bank conflicts and loop unrolling.

See this presentation for more info:

http://www.gpgpu.org/sc2007/SC07_CUDA_5_Optimization_Harris.pdf


CUDA is an excellent framework to start with. It lets you write GPGPU kernels in C. The compiler will produce GPU microcode from your code and send everything that runs on the CPU to your regular compiler. It is NVIDIA only though and only works on 8-series cards or better. You can check out CUDA zone to see what can be done with it. There are some great demos in the CUDA SDK. The documentation that comes with the SDK is a pretty good starting point for actually writing code. It will walk you through writing a matrix multiplication kernel, which is a great place to begin.


Try GPU++ and libSh

LibSh link has a good description of how they bound the programming language to the graphics primitives (and obviously, the primitives themselves), and GPU++ describes what its all about, both with code examples.


  1. You get programmable vertex and pixel shaders that allow execution of code directly on the GPU to manipulate the buffers that are to be drawn. These languages (i.e. OpenGL's GL Shader Lang and High Level Shader Lang and DirectX's equivalents ), are C style syntax, and really easy to use. Some examples of HLSL can be found here for XNA game studio and Direct X. I don't have any decent GLSL references, but I'm sure there are a lot around. These shader languages give an immense amount of power to manipulate what gets drawn at a per-vertex or per-pixel level, directly on the graphics card, making things like shadows, lighting, and bloom really easy to implement.
  2. The second thing that comes to mind is using openCL to code for the new lines of general purpose GPU's. I'm not sure how to use this, but my understanding is that openCL gives you the beginnings of being able to access processors on both the graphics card and normal cpu. This is not mainstream technology yet, and seems to be driven by Apple.
  3. CUDA seems to be a hot topic. CUDA is nVidia's way of accessing the GPU power. Here are some intros

Try GPU++ and libSh

LibSh link has a good description of how they bound the programming language to the graphics primitives (and obviously, the primitives themselves), and GPU++ describes what its all about, both with code examples.


OpenCL is an effort to make a cross-platform library capable of programming code suitable for, among other things, GPUs. It allows one to write the code without knowing what GPU it will run on, thereby making it easier to use some of the GPU's power without targeting several types of GPU specifically. I suspect it's not as performant as native GPU code (or as native as the GPU manufacturers will allow) but the tradeoff can be worth it for some applications.

It's still in its relatively early stages (1.1 as of this answer), but has gained some traction in the industry - for instance it is natively supported on OS X 10.5 and above.


Try GPU++ and libSh

LibSh link has a good description of how they bound the programming language to the graphics primitives (and obviously, the primitives themselves), and GPU++ describes what its all about, both with code examples.


If you use MATLAB, it becomes pretty simple to use GPU's for technical computing (matrix computations and heavy math/number crunching). I find it useful for uses of GPU cards outside of gaming. Check out the link below:

http://www.mathworks.com/discovery/matlab-gpu.html


Take a look at the ATI Stream Computing SDK. It is based on BrookGPU developed at Stanford.

In the future all GPU work will be standardized using OpenCL. It's an Apple-sponsored initiative that will be graphics card vendor neutral.


CUDA is an excellent framework to start with. It lets you write GPGPU kernels in C. The compiler will produce GPU microcode from your code and send everything that runs on the CPU to your regular compiler. It is NVIDIA only though and only works on 8-series cards or better. You can check out CUDA zone to see what can be done with it. There are some great demos in the CUDA SDK. The documentation that comes with the SDK is a pretty good starting point for actually writing code. It will walk you through writing a matrix multiplication kernel, which is a great place to begin.


Take a look at the ATI Stream Computing SDK. It is based on BrookGPU developed at Stanford.

In the future all GPU work will be standardized using OpenCL. It's an Apple-sponsored initiative that will be graphics card vendor neutral.