What the heck is … GPGPU?

General Purpose Computing on the Graphics Processing Unit, or GPGPU, is a technology that enables software to use the computing power of the multiple processing units that come with modern graphics cards.  The benefits of using these processing units is that computational work gets done by many workers at once, instead of one CPU or a few threads on a multi-core CPU.

At first, it may sound like multi-threading, and in a way it is.  What’s key, though, is that the GPU doesn’t just multi-thread, it’s synchonized, lock stepped multi-threading.   It’s called Single Instruction, Multiple Data, or SIMD in Flynn’s Taxonomy (NVidia’s hardware uses a variant called Single Instruction, Multiple Threads, or SIMT).  This refers to the control flow in each thread on the GPU — to maximize performance, it’s important to get all threads running through the same logic at the same time.  In practice, some threads will eventually branch, and keeping the branching to a minimum is critical.

When we have processing tasks that can be highly discrete, the SIMT nature of the GPU enables computations to really fly when using GPGPU.  Structuring computations in this fashion will result in operations that are highly parallelized.  Coincidentally, they will also be easily distributed to other computation engines.  But unlike other distributed computing methods, GPGPU is tightly coupled, and has very low latency — you get immediate results.

OpenCL Logo

To facilitate all of above, GPGPU requires a video card that supports programmability.  While the major GPU manufacturers have released proprietary GPGPU languages — for example is NVidia’s CUDA framework — a cross-platform, device-independent alternative, called OpenCL, has recently become sufficiently robust that we can begin to use it for implementing GPGPU processing.  While originally developed by Apple, OpenCL is now overseen by a non-profit organization, the Khronos Group.

If you’ve got a GPU on hand, it’s quite likely that there is an OpenCL API available in your programming language of choice.  Here’s a short list from the Khronos Group’s website:

By the way, Khronos just released the spec for OpenCL 1.1 on June 14th, 2010.

GPGPU for GIS

Our interest in GPGPU is motivated by our desire to speed up complicated raster math calculations, in particular those that would enable us to make more sophisticated geoprocessing available on web and mobile platforms.  Our vision is to make an order of magnitude or better improvements in geospatial data processing and thereby make web-based analysis tools more responsive and compelling.  To support our research we applied for and were awarded a National Science Foundation (NSF) Small Business Innovation Research (SBIR) grant to benchmark some traditional GIS operations against  GPU-accelerated versions of the same operations.  Our objective is to improve these operations by about 20x, and it looks promising.

We’re sampling the wide array of Map Algebra operations that are in standard GIS toolkits, focusing on a couple examples of each of the following types of operations: local, focal, zonal, and global.  Some of these types of operations are easily parallelized, such as local and focal operations.  The other operations require a fair amount of algorithm wrangling before they show any improvement on the GPU’s parallel architecture.

Our research is ongoing, so it’s still a bit too early to tell, but we’re excited to already see some impressive performance improvements.

GPGPU Lessons

Migrating our chosen Map Algebra operations to the GPU was trivial in some cases, but quite challenging in others.  For example, local Map Algebra operations are quite easy to parallelize.  Each raster cell is compared to another raster cell in the same location.  This requires very little extra memory and few intermediate steps.

Focal (or “neighborhood”) operations are not simple, particularly for large neighborhoods, but with some intelligent partitioning, it’s relatively easy to work with different sections of the rasters.  This requires some intermediate buffers, but performs well using local memory on the GPU.

Zonal operations summarize the cells in one raster data set using zones identified in a second one.  These operations require a couple scans across a raster, and a few intermediate stages — that one was complicated.

What’s crazy difficult, though, are the global operations.  These operations are challenging because they operate on the entire dataset, with one cell on one side of the raster potentially affecting a cell on the other side of the raster. Examples of global operations include Euclidean distance, cost-weighted distance, viewshed analysis and others.  These operations are non-trivial to convert to GPU processing in a performant way, but it is definitely possible.

Both comments and trackbacks are currently closed.

2 Comments

  1. avatar Nick Canzoneri
    Posted 17 June 2010 at 8:57 am | Permalink

    Do you see this type of technique being most useful to the desktop GIS environment or migrating server side computation to an array of video cards/cluster of PS3s?

  2. Posted 17 June 2010 at 12:00 pm | Permalink

    Hi Nick. We see lots of potential in both, really: we’re excited by the possibilities for speeding up massive calculations on the desktop as well as for bringing types of operations to web browsers and mobile devices that would have previously fallen outside of people’s performance expectations. Thanks for your interest!

3 Trackbacks

  1. By CUDA, Stream, and OpenCL | Azavea Labs on 25 June 2010 at 8:14 am

    [...] on the GPU, or GPGPU, is a steadily maturing technology.  There are many technologies out in the wild that will enable [...]

  2. [...] it’s because I’m excited about the technology.  General Purpose Computing on the GPU (GPGPU) promises great performance increases in computationally heavy software, which we find immensely [...]

  3. [...] one begins to work with GPGPU, the parallel processing benefits can be incredibly beneficial, if you know how to work with [...]