avatarArticles by David Zwarg

Introducing python-sld and django-sld

python-sld

python-sld is a simple python library that enables some basic manipulation of StyledLayerDescriptor (SLD) documents.

What are SLD documents?  SLD is a standard defined by the Open Geospatial Consortium, or OGC. In their words:

The OpenGIS® Styled Layer Descriptor (SLD) Profile of the OpenGIS® Web Map Service (WMS) Encoding Standard defines an extends the WMS standard to allow user-defined symbolization and coloring of geographic feature and coverage data.

In layman’s terms, SLD is a common way to style your own maps that come from any map server that speaks WMS (another standard by OGC). Of all the GIS tools available, the WMS server ecosystem is exceptionally rich and diverse. There are many proprietary choices, as well as a plethora of open source options.

State of the Art

Recently in the course of developing new features for DistrictBuilder, we arrived at a point where we needed to generate SLDs dynamically. Looking around at the existing python libraries, we examined:

What we were looking for was a pure object model access to components in the SLD, as well as XML validation, with very few dependencies. None of the above projects really fit the bill, so we started working on our own.

Introducing python-sld

python-sld in an open source (Apache 2.0) library for dynamic SLD creation and manipulation. The project is hosted over on github, and the packages are in pypi (including generated inline documentation).

Features

Width python-sld, creating new SLD documents is as easy as creating a new instance of a StyledLayerDescriptor object:

>>> from sld import *
>>> sld_doc = StyledLayerDescriptor()

With this SLD document, all descendants are accessed as properties, and most child objects are created off the parent with “create_xxx()” methods:

>>> sld_doc.NamedLayer is None
True
>>> nl = sld_doc.create_namedlayer('My Layer')
>>> nl.Name
'My Layer'

For most complex types, the parent’s property is an instance of the class. In our example:

>>> isinstance(nl, NamedLayer)
True
>>> us = nl.create_userstyle()
>>> us.Title = 'Style Title'
>>> us.Title
'Style Title'
>>> isinstance(us, UserStyle)
True

A couple pythonic classes break up the monotony, too. For elements that contain collections of items (a FeatureTypeStyle element may contain many Rule elements, and Fill, Stroke, and Font elements may contain many CssParameter elements), they behave as pythonic lists.

>>> fts = us.create_featuretypestyle()
>>> len(fts.Rules)
0
>>> r1 = fts.create_rule('Criteria 1')
>>> len(fts.Rules)
1
>>> fts.Rules[0].Title == r1.Title
True

Another bit of pythonic syntactic sugar is the combination of Filters. By constructing filters (with the Rule as a parent) and combining them with “+” or “|”, they create logical “AND” and “OR” filters, respectively.

>>> f1 = Filter(r1)
>>> f1.PropertyIsGreaterThan = PropertyCriterion(f1, 'PropertyIsGreaterThan')
>>> f1.PropertyIsGreaterThan.PropertyName = 'number'
>>> f1.PropertyIsGreaterThan.Literal = '-10'
>>>
>>> f2 = Filter(r1)
>>> f2.PropertyIsLessThanOrEqualTo = PropertyCriterion(f2, 'PropertyIsLessThanOrEqualTo')
>>> f2.PropertyIsLessThanOrEqualTo.PropertyName = 'number'
>>> f2.PropertyIsLessThanOrEqualTo.Literal = '10'
>>>
>>> r1.Filter = f1 + f2

When the SLD object is serialized, it will render an “ogc:And” element that contains both property comparisons. You may have noticed that both the “PropertyIsGreaterThan” and “PropertyIsLessThanOrEqualTo” properties are assigned an instance of a PropertyCriterion class. This is the common class for all property comparitors. The name of the comparitor determines it’s logical comparison (less than, greater than, equal to, etc.), and the class has a PropertyName and Literal property, to control which property gets compared, and which value it is compared against.

Finally, serialization is performed on the main StyledLayerDescriptor object, with options to ‘prettify’ the output:

>>> content = sld_doc.as_sld(pretty_print=True)

Dependencies

The lxml library is required by python-sld. This is the library that provides the underlying parsing and serializing of the XML document, as well as the validation steps against the canonical SLD schema.

Limitations

At the current time, only a subset of the entire SLD specification is implemented. All SLD elements are parsed and stored, but only the following elements may be manipulated as objects in python-sld:

  • StyledLayerDescriptor
  • NamedLayer
  • Name (of NamedLayer)
  • UserStyle
  • Title (of UserStyle and Rule)
  • Abstract
  • FeatureTypeStyle
  • Rule
  • ogc:Filter (implicit ogc:And and ogc:Or)
  • ogc:PropertyIsNotEqualTo
  • ogc:PropertyIsLessThan
  • ogc:PropertyIsLessThanOrEqualTo
  • ogc:PropertyIsEqualTo
  • ogc:PropertyIsGreaterThanOrEqualTo
  • ogc:PropertyIsGreaterThan
  • ogc:PropertyIsLike
  • ogc:PropertyName
  • ogc:Literal
  • PointSymbolizer
  • LineSymbolizer
  • PolygonSymbolizer
  • TextSymbolizer
  • Mark
  • Graphic
  • Fill
  • Stroke
  • Font
  • CssParameter

All other SLD elements cannot be directly manipulated in python-sld, but are accessible (from a parsed SLD that is perhaps more complex) via the parent object’s _node property. This is the lxml.Element that the python-sld class represents.

django-sld

django-sld builds upon the capabilities in python-sld by enabling quick SLD generation from geographic models. This library is separate from the python-sld library because of the dependencies on django and pysal, the Python Spatial Analysis Library.

Primer on Geographic Models

I gave a quick background to geographic models in django to the Boston django meetup last week, and the slides of my presentation are available online as a presentation in Google Docs. The slides are embedded here for your convenience:

Introducing django-sld

django-sld is an open source (Apache 2.0) library for generating SLD documents from geographic querysets. The project is hosted over on github, and the packages are in pypi (including generated inline documentation).

Features

django-sld enables quick classification of geographic querysets by passing the data distribution of an individual model field into the classification algorithms built into pysal. Not all classification methods in pysal are available, however. At the current version (1.0.3), the following classification algorithms are supported:

  • Equal Interval
  • Fisher Jenks
  • Jenks Caspall
  • Jenks Caspall Forced
  • Jenks Caspall Sampled
  • Max P Classifier
  • Maximum Breaks
  • Natural Breaks
  • Quantiles

To classify a django queryset, use any of the as_xxx() methods in the djsld.generator module.

>>> from djsld import generator
>>> qs = MySpatialModel.objects.all()
>>> sld = generator.as_quantiles(qs, 'population', 10)

The above example assumes that you have a model named “MySpatialModel” in django’s models.py file. The result is a sld.StyledLayerDescriptor object, which may be serialized to a string with “as_sld()”

>>> sld_content = sld.as_sld(pretty_print=True)

The “pretty_print” option is available to format the SLD in a fashion that is more readable by us humans.

In addition to simple models, django’s support for related fields really shines, as it’s possible to classify the distribution on any related field, using the “__” (double underscore) format preferred by django:

>>> sld = generator.as_quantiles(qs, 'city__population', 10)

The one caveat is that the PropertyName in the criteria will be set to this field name (which is not the way most mapping packages refer to related fields). To accommodate this difference, you may use the ‘propertyname’ keyword to control the output PropertyName:

>>> sld = generator.as_quantiles(qs, 'city__population', 10,
... propertyname='population')

Dependencies

django-sld requires python-sld and the pysal library.

Putting the Fun in FOSS

I went to the State of the Map (SotM) and Free and Open Source Software for Geospatial (FOSS4G) Conference in Denver, CO last week, where I was surrounded by geospatial users, developers, and architects. I had the opportunity to attend some workshops and learn about a slew of awesome projects — I’m itching to start incorporating many of these new tools and techniques into our solutions.

Node.js

I was able to attend some of the workshops — “You’ve got Javascript in your backend” with Node.js and Polymaps was a great beginner workshop, introducing lightweight servers and client mapping libraries. I was amazed that a basic web server in node.js is only 5 lines of code. Equally amazing was seeing what capabilities Polymaps had when it weighted in at only 32K (minified) vs. OpenLayers at 1.2M (minified default build).

i2maps + pico

Some exciting visualization tools are coming out of the National Center for Geocomputation at the National University of Ireland, in the form of i2maps. While it’s relatively immature (not much in the form of documentation), most the basic functionality builds off of OpenLayers.  Since I’ve already learned the OpenLayers library, I has a short learning curve, and was able to get up to speed pretty quickly.  Their library incorporates some awesome features like dynamically loading and evaluating rasters via canvas (this only works on modern browsers), and even agent-based modeling. I could have stayed in that workshop for a week.

A byproduct of the i2maps project is pico. Pico is a bridge between Python and Javascript, enabling you to call native Python methods directly from Javascript. It performs all the plumbing for you, allowing you to write a simple callback to handle your method’s return value. It also takes care of converting Python objects into Javascript objects, allowing you to pass all sort of data back and forth (including rasters!).

mod-geocache

Another new project from a contributor to the MapServer project is mod-geocache, a tile caching service as an Apache module. This skips a lot of overhead (no proxying, no interpreters, no CGI), and is very fast. In addition, the C implementation has excellent speed and performance. You can perform on the fly tile merging, quantization, and recompression. I’m excited about this module, and the promise of caching with an Apache server (looks like it has more features than mod_tile).

Geoserver

Geoserver‘s next release is also going to include some great features. The ones that really jumped out at me:

  • Time and elevation filters — e.g. storm tracking, where you can limit the features by a time field.
  • Styling SLDs in data units — e.g. “road is 5m wide”, and changes dynamically with scale. This greatly simplifies scale-dependent renderers.
  • Georeferencing of layers can be done in the admin interface.
  • Layers can be view definitions — you don’t have to roll your own views prior to creating the layer.
  • Virtual Services — partition the data layers by workspace.

These aren’t all the new features; take a look at the laundry list yourself, and prepare to be impressed.

Mapnik 2

I think the reason for calling it Mapnik2 is that it is literally twice as awesome as it was before. I learned about the new features in Mapnik2 in the lightning talks at SotM, and I think this was one of the few talks that made you feel like you were actually struck by lightning. I can’t remember half the slides in the talk, but the supported formats, reprojection, styling, and speed improvements left me with my head spinning.

Building Districts in Web-Time

DistrictBuilder logoMost recently, the Politics, Redistricting and Elections team has been working closely with the Public Mapping Project to build DistrictBuilder, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you’ve seen how badly the professionals can mangle districts (Exhibit A, Exhibit B, etc), it’s easy to imagine that any given citizen, given the right tools, could do it better.

We spent quite a bit of time making the application easy to use and responsive in modern desktop web browsers.  The “easy to use” part was tackled by our excellent UI/UX design team. The “responsive” part was the domain of  our engineers.  That’s where the fun began for me.

DistrictBuilder is designed to use any polygon shapefile, transform it into an internal data model, then make that accessible via map tiles and geometric features.  When serving map tiles, we use GeoServer and GeoWebCache to generate the tiles and cache them, respectfully. This performance is great — pre-generated map tiles are the best we can aim for with respect to the base map tiles. Serving geometric features at full resolution, however, introduces a slew of problems. A few that stood out right away:

  • Web Browser Limitations — 9 out of 10 experts agree: too many map features has a significant performance impact on web browsers, with the greatest impact on the Microsoft Internet Explorer browser.
  • Excessive Coordinates — delivering lots of polygon coordinate pairs that the user may never see consumes valuable bandwidth and rendering time.
  • Server Processing Time — recalculating state-wide geometric features consumes valuable CPU time.

Web Browser Limitations

First, we tackled the browser performance issues. A sluggish browser is the kiss of death in the web world, and we had to make the application experience as fast as possible before looking at the server processing time.

We originally gave users the power to create highly detailed districts at the statewide level, but realized that no modern web browser could handle the volume of polygon features that would need to be served to represent an entire state.  In order to mitigate this limitation, we limited the size and number of features sent to the browser. With some scale-dependent logic, a user zoomed in to a detail of a district can finely tune the boundary by moving smaller geographic features (e.g. census blocks), and a user zoomed out to the state-wide level can manipulate the districts by moving large geographic features only (e.g. counties). In addition, when editing the finest details, we limit the number of features a user can move in a single edit.

Excessive Coordinates

The next thing to go was the set of full resolution geometries. In DistrictBuilder, users never actually see the full geometries, but an adaptively simplified (sometimes called generalized) geometry; depending on the scale of the map view, the server will deliver geometries with appropriate coordinate resolutions. Simply put: as you zoom in on the map, you get more detail in the geometries.

By simplifying counties, the geometries are reduced from 166,958 points to 4,821. When a user is zoomed out, there is no noticeable difference between these geometries!  However, as the user is interacting with higher resolution maps, DistrictBuilder loads in higher-resolution geometries on demand. The following images demonstrate the difference in the geometry detail:

Low Resolution Transition

The zoomed in County layer, with a low resolution district overlay (orange line). There are currently 1,414 coordinates in this view of the district overlay.

High Resolution Transition

The zoomed in VTD layer, with a high resolution district overlay (orange line). There are currently 3,253 coordinates in this view of the district overlay.

You can notice the differences in the district detail if you look closely at the orange district boundary. This transition happens seamlessly in the application, loading in the higher resolution geometries as web users zoom in to areas of interest.

We also eliminated coordinates that you never see.  It made no sense to serve  coordinates that were located in the opposite side of the state where a user was editing, just like you wouldn’t expect to get an encyclopedia in the mail when releasing an RFI. With the OpenLayers library, Strategies came in handy here, particularly BBOX.

Server Processing Time

After we had optimized the performance of the user interface, we shifted our focus to the server-side processing.  One of the features that makes DistrictBuilder such a powerful tool is the accuracy of the underlying data and constant feedback of important district statistics. In order to calculate all these statistics on the fly, it is necessary to leverage some tricks already mentioned with respect to map tiles: caching and generalizing.

Computation of the district statistics must happen every time a district boundary is changed. A naive solution to this problem would be to aggregate the values within the boundary every time a change is made.  This approach results in horrible performance. Instead, we just determine what has changed — which areas were added, which areas were removed — and recompute the delta, or change, on the previous district value.

Another trick to optimizing performance is in the way we determine the changing boundaries.  I’ll describe the problem using the census geographies of counties, tracts, and blocks. The structure and detail of the underlying data yielded computationally expensive queries against the block geometries.  We came up with a method of searching for the geographies in a hierarchical fashion — searching the counties first, then continuing to the next smallest-scale geography only if there was any remaining geometry left in the query.  We did the same for the tracts, and took a shortcut at the block level to exclude the block geometries.  This increased server side performance considerably.

King William County

King William county is comprised of 22 Voter Tabulation Districts and 1,527 Census Blocks.

Consider the following scenario: a user wants to move King William County (highlighted in yellow) from District 1, which is over populated, to District 3, which is under populated. Changing the boundaries with all the blocks in King William County would require testing at least 4,000 blocks for spatial intersections, then aggregating 1,527 data values, and recomputing the spatial aggregate (union) of those 1,527 geometries. With our hierarchical approach, we can change the boundary of the district with the county boundary, and change the population totals by the county’s population. A few orders of magnitude fewer operations to perform, and much faster from the user’s perspective.

Lessons Learned

Throughout the DistrictBuilder development process, the same core performance challenge has arisen: the volume of data must be reduced. This applies to all aspects of the application:

  • Map Tiles: pre-render tiles to keep the number of rendered tiles to a minimum at runtime.
  • Map Features: deliver to the browser only as much information as you can see (perhaps even less).
  • Database Queries: do anything possible to ensure that geometric operations are performed on simplified geometries.
  • Aggregating Statistics: cache whatever you can, and only compute the difference from the last cache state.

The above steps reduced the sheer number of operations and volume of processing that both the server and browser need to complete when creating new districts. These are lessons that translate well to any “big data” problem, and are crucial in bringing sophisticated GIS operations to the web.

OpenStreetMap, Cambridge Style

This past weekend I assisted in the organizing of an OpenStreetMap Mapping Party here in Cambridge. It was a diverse group of people, from all sorts of background, with various experience with mapping tools and GPS devices.

The Free Wiki World Map

CC-SA 2.0 OSM

Why a mapping party in Cambridge, MA? One may say that the area is pretty well covered. Well, yes. Most of the data in Massachusetts was imported from MassGIS data. Why? MassGIS states that the data collection was funded by taxpayers, and therefore lies in the public domain.

This is a great start, but as we discovered throughout the day, that data was out of date, and in some cases, incorrect. Not like that was a surprise, though. Buildings are built, roads are relocated, and sidewalks are shifted. It’s just that it’s now up to us to keep it up to date.

In addition, we found out first-hand just how bad the ‘urban canyon’ effect is. Relying only on our GPS devices would have been disasterous, as the tracks bounce all over the place when you are walking between large buildings.  It was impossible to map areas within these urban canyons relying on GPS reception alone. Fortunately, we were introduced to walking papers by Lars Ahlzen.

Walking Papers is a product of Stamen Design‘s Michal Migurski, and it’s a great low-tech solution for people who want to contribute to OpenStreetMap, but who don’t have a GPS, or don’t have the knack for gadgets (like many technophiles I know). These are paper maps for hand-annotating with pen or pencil. When you’re done, you can scan and upload your map back to walking papers for adding to OpenStreetMap (I’m almost there with my set in Cambridge, I’ll let you know how it goes).

Altogether, it was a fun afternoon, with warm beverages and snacks provided by Azavea. I hope we got some folks excited about OpenStreetMap, and how easy it is to improve our maps.  Next time, instead of the city, I think I’ll  map some trails in the woods around New England.

Update:

I printed my walking papers of the Kendall Square area, near our Cambridge office, and did some quick surveying on my way home last night. This morning, I tried scanning and uploading the image again.  I found that the uploader was finicky, and it seemed to work when:

  1. File format is JPG (TIFF was a no-go)
  2. Scanned images were scanned as full-color photos, and not black & white photos or text
  3. Scanned image resolution was 300dpi

When I figured all that out, I was able to get my scanned print into walking-papers, and even added the walking-papers plugin into JOSM. I found that the tiles of my scanned print sometimes didn’t load in properly, but that may be due to operator error (impatience).

All in all, I think walking-papers is a really great addition to the OpenStreetMap ecosystem. Now you have one less reason to not be mapping your own town!

http://www.openstreetmap.org/?lat=42.36209&lon=-71.08763&zoom=16&layers=M

GPU Occupancy and Idling

As our ongoing research into raster processing for GIS on the GPU progresses, we have gone through various stages in the development of each Map Algebra operation.  Having converted a given operation to the GPU, we are finding that there are many potential ways to optimize, and this optimization process brings with it a host of issues that highlight the differences between sequential CPU programming and GPGPU parallel programming.

During the optimization process, we’ve found (and been told) that the single most important optimization is to ensure memory coalescence.  I blogged about that before, so if you haven’t seen it yet, it might be worth reading before you continue on.

After maximum memory coalescence has been achieved, it is possible to focus on 2 additional metrics: occupancy and idling.

Occupancy

The occupancy metric is defined as the number of active thread groups per processor divided by the maximum number of thread groups per processor.  It’s a value in the range of 0-100%.

Occupancy is the number of thread groups (NVidia calls them ‘warps’, ATI calls them ‘wavefronts’) that are active at one time.  At any one time, some thread groups may be processing data, and some thread groups may be accessing global memory.  When some thread groups are accessing global memory, these threads are effectively stalled for hundreds of instructions, while the other thread groups continue on.

Internally, the GPU has a thread group scheduler which controls when thread groups are executed. This is extremely useful, since highly parallel operations will utilize many thread groups to perform calculations. The GPU is highly parallel, but even it has its limits. This is where the thread group scheduler comes in — it can execute some of the thread groups, while other thread groups are idle, either completed or queued. This scheduling enables some thread groups to perform memory access, while other thread groups perform calculations.

Understanding the scheduler makes it possible to ‘hide’ these global memory accesses by performing ~100 arithmetic instructions between each global memory access.  Hypothetically, if the GPU executed a kernel that accessed global memory, performed a heavy-duty calculation, then saved that result, the occupancy would probably be pretty high. The thread group scheduler would schedule a set of thread groups for accessing global memory while scheduling another set of thread groups for heavy-duty calculation. This is effectively ‘hiding’ the memory access, since the GPU can perform computation instructions while accessing memory. Interestingly, there will be a point when increases to occupancy won’t improve your performance. It is at this point when all global memory accesses are ‘hidden’ by the computation, and it becomes time to look other places for optimization.

Idling

The idling metric is defined as the amount of time the GPU is idle divided by the overall execution time of the computation.  It’s a value in the range of 0-100%.

Idling is something that we have discovered to be critical to the performance of a calculation.  The reference and training documentation instructs GPGPU developers to keep the GPU as busy as possible for as long as possible, and stops there.  By creating this metric, we were able to measure just how much this idling was affecting our computation.

As it turns out, our initial experiments showed that our GPU was idle during periods of memory transfer to and from the CPU.  This idling of the GPU was extending the overall time for computation.  Minimizing this idling through asynchronous kernel execution and memory transfer resulted in a significant and immediate performance improvement.

Coalescence, Occupancy, Idling

To summarize, the best way to optimize your GPU computations is to investigate and optimize these three steps (and in this order):

  1. Memory coalescence
  2. Thread group occupancy
  3. GPU Idling

There are a number of smaller optimization that can be done as well, but we’ve found these to be the big 3.  Of course, you can continue this process forever, and demonstrate to your boss the law of diminishing returns.

GPU Memory Bandwidth and Coalescing

When one begins to work with GPGPU, the parallel processing benefits can be incredibly beneficial, if you know how to work with coalesced memory. This fits in with a parallel algorithm approach, incorporating the following:

  1. thinking about your computation in a data-parallel fashion.
  2. transferring working data into a local memory cache.
  3. considering scrutinizing how your code performs global memory accesses.

The first item almost goes without saying.  If you are hoping to leverage a massively parallel computing device, you obviously have to break your problem or computation down into discrete units that can be operated on in parallel.

It’s the second and third point that I am going to focus on in this post, since they are the most important factors when optimizing your GPGPU code.  The reason these are the most important factors are that local memory is so much faster at reading and writing than global memory, and the memory module in modern GPUs can perform concurrent reads to sequential global memory positions for an entire thread group.

Local Memory Caching

Use of a local memory cache may seem counter-intuitive to a programmer coming from CPU land.  The best analogy would be: storing your working data in RAM instead of on disk.  While not a perfect analogy, a CPU programmer understands perfectly the ramifications of such a design decision — any data accessed from disk will be retrieved more slowly than data accessed from RAM.  Likewise for local and global memory.  Local memory is on-chip memory that is exceptionally fast.  Global memory is off-chip memory that is often used to transfer data to/from the host (often the CPU).  I’m talking about a 100x speed difference when using local memory instead of global memory.

In addition to the differences in global and local memory, the memory bandwidth to/from the graphics card (which contains its own memory and processors) and the motherboard (which contains RAM and one or more CPUs) is another bottleneck.  Data transfer rates across the PCI Express 2.0 bus are about 8 GB/s.  Data transfer rates in the graphics card are around 141 GB/s.  So not only is the place in which you store your working data important, but also when and how you transfer that data to/from the GPU device itself.

Sequential Global Memory a.k.a. Coalescence

And “sequential global memory positions”? What is that?  Inside a GPGPU kernel, when accessing a portion of global memory, all threads in that group (NVidia calls them ‘warps’, and ATI calls them ‘wavefronts’) access a bank of memory at one time.  For example, if there are 16 threads executing with the same kernel, 16 sequential positions in global memory (1 position per thread) can be accessed in the same time that it would take 1 thread to read 1 position in memory.  If all memory accesses are performed this way, performance can speed up by a factor of 16 (in the memory access code).

That’s a wonderful way to speed up data-intensive operations, especially when one is working with raster data, and a given block of cells is accessed multiple times.  It is in this scenario that our research has recently landed us.

Another thing worth noting is that coalescence concept applies to global memory on the GPU only — local memory does not suffer the same performance hit, so does not need to take advantage of this technique.  But global memory access on the GPU takes about 100x as many instructions as local memory access.  This means that if you have coalesced global memory access, you are saving hundreds of instructions per thread.  This starts to add up when you consider that processing a raster may require hundreds or thousands of threads.

Armed with this knowledge, parallel algorithm implementations begin to have similar structures with regards to memory access.  The resulting code can be highly complex, though, and it’s not trivial to debug, but some new tools from NVidia and ATI are enabling developers to profile and visualize the work performed by the GPU. In my next post, I’ll discuss latency and occupancy, two metrics that one can use to help optimize GPU kernels.

GPUs and Parallel Computing Architectures

I’ve been blogging about GPUs recently, and I think you can tell it’s because I’m excited about the technology.  General Purpose Computing on the GPU (GPGPU) promises great performance increases in computationally heavy software, which we find immensely useful.  In the past, we’ve managed to engineer web-based applications (see: SmartConservation) that could run complex models by implementing a process queuing architecture, but in these systems, while they will run on the web, processing may still take several minutes and they therefore can neither provide a responsive user experience nor support many users.  We’ve also engineered a system that can perform fast, distributed raster calculations (see: Walkshed, powered by DecisionTree).

One of the reasons that GPGPU is so promising is the increasing number of processing cores available on affordable graphics cards.  This increases the computation capacity by leveraging many processors running in parallel. What’s interesting is that this technique is not new.  Timothy Mattson, blogging at Intel, has been doing this since the mid 80′s.  The Library of Congress contains a book on parallel computing structures and algorithms, dating back to 1969.

As we delve deeper into our work improving Map Algebra operations, important differences in algorithm approaches and implementations become apparent: not all parallel architectures are the same.  One might be tempted to think that when switching from the single-threaded CPU logic to multithreaded/parallel logic that there would be one model of parallel computing that is universal.  This is definitely not the case.

Three of the most popular types of parallel computing today are:

  • Shared-memory Multi-Processors (SMP)
  • Distributed-memory Massive Parallel Processors (MPP)
  • Cluster computing

Each type of parallel computing has its benefits and drawbacks.  It really just depends what kind of computing you need to do. I’ll describe these common computing types in detail, starting with the ‘traditional’ CPU model.

» Continue Reading