Computing on the GPU, or GPGPU, is a steadily maturing technology. There are many technologies out in the wild that will enable you to use GPU’s for computation, but there’s a catch: the vendors are still vying for the lead. The two market leaders are currently NVidia and AMD/ATI.
That means that NVidia is pushing their GPGPU API, which is named Compute Unified Device Architecture,” or CUDA. Their rival, AMD/ATI, is pushing Stream. Stream incorporates BrookGPU, a compiler and data-parallel language developed at Stanford University, which predates CUDA.

NVidia & CUDA, ATI & Stream, or OpenCL
Both of these vendor APIs are proprietary, and run on each vendor’s specific hardware. This makes sense if a developer can control what hardware computations will be using. Realistically, a developer rarely has such control. So what are the options? At the current time, there are only a couple: OpenCL and Microsoft’s DirectCompute technology. Microsoft’s technology is limited to Windows Vista and Windows 7, though, so we are focusing on OpenCL.
OpenCL is the Open Computing Language, a language that extends the C99 standard (a modern dialect of the C programming language) and compiles into device-specific binaries. OpenCL was originally developed by Apple, and handed over to the Khronos Group. The OpenCL standard was ratified by the consortium in December of 2008. The Khronos Group consortium includes all the major players in the field, including NVidia, AMD/ATI, Apple, and Intel. The list is much more extensive, but those are the four to be happy about. Intel doesn’t support OpenCL in their multicore CPUs, but I’m optimistic that they will release an OpenCL API to leverage CPU cores as well as GPU cores as computing devices.
OpenCL was created to address the need for speed in current desktop systems that contain GPU processors. The language was created to address computing on heterogeneous systems, which, when you think about it, can include many other types of computing devices. If OpenCL is adopted by Android, then you could optimize code to run on Android devices, too. While this may not be the fastest approach, it would potentially let you distribute work among devices.
One caveat to heterogeneous systems, though: OpenCL kernels that are written and optimized for one hardware platform probably won’t perform the same as on another hardware platform. While OpenCL enables developers to write code that can run on multiple hardware devices, the hardware implementations may vary. For example, the number of processor cores, and thus the number of parallel threads may vary widely.
If you can’t tell already, we are sold on the promise of OpenCL for GPGPU. The language is easy to use (if you already know C), and it supports the two biggest players in the GPU market, NVidia and AMD/ATI. We are hoping that Intel releases their OpenCL drivers for CPUs, too, so that we can squeeze out the last drip of computing power for our computations.
Other posts in this series
- GPU Computing for GIS - June 14, 2010
- What the heck is ... GPGPU? - June 16, 2010
- CUDA, Stream, and OpenCL (This post) - June 25, 2010
- GPUs and Parallel Computing Architectures - June 29, 2010
- GPU Memory Bandwidth and Coalescing - July 1, 2010
- GPU Occupancy and Idling - July 7, 2010






