<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Azavea Labs &#187; David Zwarg</title>
	<atom:link href="http://www.azavea.com/blogs/labs/author/dzwarg/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.azavea.com/blogs/labs</link>
	<description>Insight on what our engineers are doing</description>
	<lastBuildDate>Mon, 06 Feb 2012 22:32:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Introducing python-sld and django-sld</title>
		<link>http://www.azavea.com/blogs/labs/2011/12/introducing-python-sld-and-django-sld/</link>
		<comments>http://www.azavea.com/blogs/labs/2011/12/introducing-python-sld-and-django-sld/#comments</comments>
		<pubDate>Wed, 21 Dec 2011 18:23:04 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=1830</guid>
		<description><![CDATA[python-sld python-sld is a simple python library that enables some basic manipulation of StyledLayerDescriptor (SLD) documents. What are SLD documents?  SLD is a standard defined by the Open Geospatial Consortium, or OGC. In their words: The OpenGIS® Styled Layer Descriptor (SLD) Profile of the OpenGIS® Web Map Service (WMS) Encoding Standard defines an extends the [...]]]></description>
			<content:encoded><![CDATA[<h1>python-sld</h1>
<p><a title="python-sld on pypi.python.org" href="http://pypi.python.org/pypi/python-sld/">python-sld</a> is a simple python library that enables some basic manipulation of StyledLayerDescriptor (SLD) documents.</p>
<p>What are SLD documents?  <a title="SLD standard" href="http://www.opengeospatial.org/standards/sld">SLD</a> is a standard defined by the <a href="http://www.opengeospatial.org/">Open Geospatial Consortium</a>, or OGC. In their words:</p>
<blockquote><p>The OpenGIS® Styled Layer Descriptor (SLD) Profile of the OpenGIS® Web Map Service (WMS) Encoding Standard defines an extends the WMS standard to allow user-defined symbolization and coloring of geographic feature and coverage data.</p></blockquote>
<p>In layman&#8217;s terms, SLD is a common way to style your own maps that come from any map server that speaks <a title="WMS standard" href="http://www.opengeospatial.org/standards/wms">WMS</a> (another standard by OGC). Of all the GIS tools available, the WMS server ecosystem is exceptionally rich and diverse. There are <a href="http://www.intergraph.com/sgi/products/productFamily.aspx?family=10">many</a> <a href="http://resources.arcgis.com/content/arcims/10.0/about">proprietary</a> <a href="http://www.pbinsight.com/products/location-intelligence/developer-tools/desktop-mobile-and-internet-offering/mapxtreme-2008/">choices</a>, as <a href="http://www.qgis.org/">well</a> <a href="http://goworldwind.org/server/">as a </a>plethora <a href="http://www.resc.rdg.ac.uk/trac/ncWMS/">of</a> <a href="http://www.mapserver.org/">open</a> <a href="http://mapguide.osgeo.org/">source</a> <a href="http://mapnik.org/">options</a>.</p>
<h2>State of the Art</h2>
<p>Recently in the course of developing new features for <a href="http://www.districtbuilder.org/">DistrictBuilder</a>, we arrived at a point where we needed to generate SLDs dynamically. Looking around at the existing python libraries, we examined:</p>
<ul>
<li><a href="https://github.com/opengeogroep/pySLD">pySLD</a></li>
<li><a href="http://www.webrian.ch/2011/10/save-as-sld-030-released.html">qGIS plugin &#8220;Save as SLD&#8221;</a></li>
<li><a href="http://geoscript.org/py/">Geoscript</a></li>
</ul>
<p>What we were looking for was a pure object model access to components in the SLD, as well as XML validation, with very few dependencies. None of the above projects really fit the bill, so we started working on our own.</p>
<h2>Introducing python-sld</h2>
<p>python-sld in an open source (<a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0</a>) library for dynamic SLD creation and manipulation. The project is hosted over on <a href="https://github.com/azavea/python-sld/">github</a>, and the packages are in <a href="http://pypi.python.org/pypi/python-sld">pypi</a> (including generated inline <a href="http://packages.python.org/python-sld/sld.CssParameter-class.html">documentation</a>).</p>
<h3><strong>Features</strong></h3>
<p>Width python-sld, creating new SLD documents is as easy as creating a new instance of a <a href="http://packages.python.org/python-sld/sld.StyledLayerDescriptor-class.html">StyledLayerDescriptor</a> object:</p>
<pre>&gt;&gt;&gt; from sld import *
&gt;&gt;&gt; sld_doc = StyledLayerDescriptor()</pre>
<p>With this SLD document, all descendants are accessed as properties, and most child objects are created off the parent with &#8220;create_xxx()&#8221; methods:</p>
<pre>&gt;&gt;&gt; sld_doc.NamedLayer is None
True
&gt;&gt;&gt; nl = sld_doc.create_namedlayer('My Layer')
&gt;&gt;&gt; nl.Name
'My Layer'</pre>
<p>For most complex types, the parent&#8217;s property is an instance of the class. In our example:</p>
<pre>&gt;&gt;&gt; isinstance(nl, NamedLayer)
True
&gt;&gt;&gt; us = nl.create_userstyle()
&gt;&gt;&gt; us.Title = 'Style Title'
&gt;&gt;&gt; us.Title
'Style Title'
&gt;&gt;&gt; isinstance(us, UserStyle)
True</pre>
<p>A couple pythonic classes break up the monotony, too. For elements that contain collections of items (a <a href="http://packages.python.org/python-sld/sld.FeatureTypeStyle-class.html">FeatureTypeStyle</a> element may contain many <a href="http://packages.python.org/python-sld/sld.Rule-class.html">Rule</a> elements, and <a href="http://packages.python.org/python-sld/sld.Fill-class.html">Fill</a>, <a href="http://packages.python.org/python-sld/sld.Stroke-class.html">Stroke</a>, and <a href="http://packages.python.org/python-sld/sld.Font-class.html">Font</a> elements may contain many <a href="http://packages.python.org/python-sld/sld.CssParameter-class.html">CssParameter</a> elements), they behave as pythonic lists.</p>
<pre>&gt;&gt;&gt; fts = us.create_featuretypestyle()
&gt;&gt;&gt; len(fts.Rules)
0
&gt;&gt;&gt; r1 = fts.create_rule('Criteria 1')
&gt;&gt;&gt; len(fts.Rules)
1
&gt;&gt;&gt; fts.Rules[0].Title == r1.Title
True</pre>
<p>Another bit of pythonic syntactic sugar is the combination of <a href="http://packages.python.org/python-sld/sld.Filter-class.html">Filter</a>s. By constructing filters (with the <a href="http://packages.python.org/python-sld/sld.Rule-class.html">Rule</a> as a parent) and combining them with &#8220;+&#8221; or &#8220;|&#8221;, they create logical &#8220;AND&#8221; and &#8220;OR&#8221; filters, respectively.</p>
<pre>&gt;&gt;&gt; f1 = Filter(r1)
&gt;&gt;&gt; f1.PropertyIsGreaterThan = PropertyCriterion(f1, 'PropertyIsGreaterThan')
&gt;&gt;&gt; f1.PropertyIsGreaterThan.PropertyName = 'number'
&gt;&gt;&gt; f1.PropertyIsGreaterThan.Literal = '-10'
&gt;&gt;&gt;
&gt;&gt;&gt; f2 = Filter(r1)
&gt;&gt;&gt; f2.PropertyIsLessThanOrEqualTo = PropertyCriterion(f2, 'PropertyIsLessThanOrEqualTo')
&gt;&gt;&gt; f2.PropertyIsLessThanOrEqualTo.PropertyName = 'number'
&gt;&gt;&gt; f2.PropertyIsLessThanOrEqualTo.Literal = '10'
&gt;&gt;&gt;
&gt;&gt;&gt; r1.Filter = f1 + f2</pre>
<p>When the SLD object is serialized, it will render an &#8220;ogc:And&#8221; element that contains both property comparisons. You may have noticed that both the &#8220;PropertyIsGreaterThan&#8221; and &#8220;PropertyIsLessThanOrEqualTo&#8221; properties are assigned an instance of a <a href="http://packages.python.org/python-sld/sld.PropertyCriterion-class.html">PropertyCriterion</a> class. This is the common class for all property comparitors. The name of the comparitor determines it&#8217;s logical comparison (less than, greater than, equal to, etc.), and the class has a PropertyName and Literal property, to control which property gets compared, and which value it is compared against.</p>
<p>Finally, serialization is performed on the main StyledLayerDescriptor object, with options to &#8216;prettify&#8217; the output:</p>
<pre>&gt;&gt;&gt; content = sld_doc.as_sld(pretty_print=True)</pre>
<h3><strong>Dependencies</strong></h3>
<p>The <a href="http://lxml.de/">lxml</a> library is required by python-sld. This is the library that provides the underlying parsing and serializing of the XML document, as well as the validation steps against the canonical SLD schema.</p>
<h3><strong>Limitations</strong></h3>
<p>At the current time, only a subset of the entire SLD specification is implemented. All SLD elements are parsed and stored, but only the following elements may be manipulated as objects in python-sld:</p>
<ul>
<li>StyledLayerDescriptor</li>
<li>NamedLayer</li>
<li>Name (of NamedLayer)</li>
<li>UserStyle</li>
<li>Title (of UserStyle and Rule)</li>
<li>Abstract</li>
<li>FeatureTypeStyle</li>
<li>Rule</li>
<li>ogc:Filter (implicit ogc:And and ogc:Or)</li>
<li>ogc:PropertyIsNotEqualTo</li>
<li>ogc:PropertyIsLessThan</li>
<li>ogc:PropertyIsLessThanOrEqualTo</li>
<li>ogc:PropertyIsEqualTo</li>
<li>ogc:PropertyIsGreaterThanOrEqualTo</li>
<li>ogc:PropertyIsGreaterThan</li>
<li>ogc:PropertyIsLike</li>
<li>ogc:PropertyName</li>
<li>ogc:Literal</li>
<li>PointSymbolizer</li>
<li>LineSymbolizer</li>
<li>PolygonSymbolizer</li>
<li>TextSymbolizer</li>
<li>Mark</li>
<li>Graphic</li>
<li>Fill</li>
<li>Stroke</li>
<li>Font</li>
<li>CssParameter</li>
</ul>
<p>All other SLD elements cannot be directly manipulated in python-sld, but are accessible (from a parsed SLD that is perhaps more complex) via the parent object&#8217;s _node property. This is the lxml.Element that the python-sld class represents.</p>
<h1>django-sld</h1>
<p>django-sld builds upon the capabilities in python-sld by enabling quick SLD generation from geographic models. This library is separate from the python-sld library because of the dependencies on <a href="https://www.djangoproject.com/">django</a> and <a href="http://code.google.com/p/pysal/">pysal</a>, the Python Spatial Analysis Library.</p>
<h2>Primer on Geographic Models</h2>
<p>I gave a quick background to geographic models in django to the <a href="http://www.meetup.com/djangoboston/events/43145722/">Boston django meetup</a> last week, and the slides of my presentation are <a href="https://docs.google.com/present/view?id=ddpq33ft_104cdq773cs">available online</a> as a presentation in Google Docs. The slides are embedded here for your convenience:</p>
<p><iframe src="https://docs.google.com/present/embed?id=ddpq33ft_104cdq773cs" frameborder="0" width="410" height="342"></iframe></p>
<h2>Introducing django-sld</h2>
<p>django-sld is an open source (<a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0</a>) library for generating SLD documents from geographic querysets. The project is hosted over on <a href="https://github.com/azavea/django-sld/">github</a>, and the packages are in <a href="http://pypi.python.org/pypi/django-sld">pypi</a> (including generated inline <a href="http://packages.python.org/django-sld/">documentation</a>).</p>
<h3><strong>Features</strong></h3>
<p>django-sld enables quick classification of geographic querysets by passing the data distribution of an individual model field into the classification algorithms built into pysal. Not all classification methods in pysal are available, however. At the current version (1.0.3), the following classification algorithms are supported:</p>
<ul>
<li>Equal Interval</li>
<li>Fisher Jenks</li>
<li>Jenks Caspall</li>
<li>Jenks Caspall Forced</li>
<li>Jenks Caspall Sampled</li>
<li>Max P Classifier</li>
<li>Maximum Breaks</li>
<li>Natural Breaks</li>
<li>Quantiles</li>
</ul>
<p>To classify a django queryset, use any of the as_xxx() methods in the djsld.generator module.</p>
<pre>&gt;&gt;&gt; from djsld import generator
&gt;&gt;&gt; qs = MySpatialModel.objects.all()
&gt;&gt;&gt; sld = generator.as_quantiles(qs, 'population', 10)</pre>
<p>The above example assumes that you have a model named &#8220;MySpatialModel&#8221; in django&#8217;s models.py file. The result is a sld.<a href="http://packages.python.org/python-sld/sld.StyledLayerDescriptor-class.html">StyledLayerDescriptor</a> object, which may be serialized to a string with &#8220;as_sld()&#8221;</p>
<pre>&gt;&gt;&gt; sld_content = sld.as_sld(pretty_print=True)</pre>
<p>The &#8220;pretty_print&#8221; option is available to format the SLD in a fashion that is more readable by us humans.</p>
<p>In addition to simple models, django&#8217;s support for related fields really shines, as it&#8217;s possible to classify the distribution on any related field, using the &#8220;__&#8221; (double underscore) format preferred by django:</p>
<pre>&gt;&gt;&gt; sld = generator.as_quantiles(qs, 'city__population', 10)</pre>
<p>The one caveat is that the PropertyName in the criteria will be set to this field name (which is not the way most mapping packages refer to related fields). To accommodate this difference, you may use the &#8216;propertyname&#8217; keyword to control the output PropertyName:</p>
<pre>&gt;&gt;&gt; sld = generator.as_quantiles(qs, 'city__population', 10,
... propertyname='population')</pre>
<h3><strong>Dependencies</strong></h3>
<p>django-sld requires python-sld and the pysal library.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2011/12/introducing-python-sld-and-django-sld/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Putting the Fun in FOSS</title>
		<link>http://www.azavea.com/blogs/labs/2011/09/putting-the-fun-in-foss/</link>
		<comments>http://www.azavea.com/blogs/labs/2011/09/putting-the-fun-in-foss/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 15:38:42 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[foss4g]]></category>
		<category><![CDATA[GeoServer]]></category>
		<category><![CDATA[i2maps]]></category>
		<category><![CDATA[mapnik]]></category>
		<category><![CDATA[nodejs]]></category>
		<category><![CDATA[opensource]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=1814</guid>
		<description><![CDATA[I went to the State of the Map (SotM) and Free and Open Source Software for Geospatial (FOSS4G) Conference in Denver, CO last week, where I was surrounded by geospatial users, developers, and architects. I had the opportunity to attend some workshops and learn about a slew of awesome projects &#8212; I&#8217;m itching to start [...]]]></description>
			<content:encoded><![CDATA[<p>I went to the State of the Map (<a href="http://stateofthemap.org/">SotM</a>) and Free and Open Source Software for Geospatial (<a href="http://2011.foss4g.org/">FOSS4G</a>) Conference in Denver, CO last week, where I was surrounded by geospatial users, developers, and architects. I had the opportunity to attend some workshops and learn about a slew of awesome projects &#8212; I&#8217;m itching to start incorporating many of these new tools and techniques into our solutions.</p>
<h2>Node.js</h2>
<p>I was able to attend some of the workshops &#8212; &#8220;You&#8217;ve got Javascript in your backend&#8221; with <a href="http://nodejs.org/">Node.js</a> and <a href="http://polymaps.org/">Polymaps</a> was a great beginner workshop, introducing lightweight servers and client mapping libraries. I was amazed that a basic web server in node.js is only 5 lines of code. Equally amazing was seeing what capabilities Polymaps had when it weighted in at only 32K (minified) vs. <a href="http://www.openlayers.org/">OpenLayers</a> at 1.2M (minified default build).</p>
<h2>i2maps + pico</h2>
<p>Some exciting visualization tools are coming out of the <a href="http://ncg.nuim.ie/">National Center for Geocomputation</a> at the <a href="http://www.nuim.ie/">National University of Ireland</a>, in the form of <a href="http://ncg.nuim.ie/i2maps/docs/">i2maps</a>. While it&#8217;s relatively immature (not much in the form of documentation), most the basic functionality builds off of OpenLayers.  Since I&#8217;ve already learned the OpenLayers library, I has a short learning curve, and was able to get up to speed pretty quickly.  Their library incorporates some awesome features like dynamically loading and evaluating rasters via canvas (this only works on modern browsers), and even agent-based modeling. I could have stayed in that workshop for a week.</p>
<p>A byproduct of the i2maps project is <a href="https://github.com/fergalwalsh/pico">pico</a>. Pico is a bridge between <a href="http://www.python.org/">Python</a> and Javascript, enabling you to call native Python methods directly from Javascript. It performs all the plumbing for you, allowing you to write a simple callback to handle your method&#8217;s return value. It also takes care of converting Python objects into Javascript objects, allowing you to pass all sort of data back and forth (including rasters!).</p>
<h2>mod-geocache</h2>
<p>Another new project from a contributor to the MapServer project is <a href="http://code.google.com/p/mod-geocache/">mod-geocache</a>, a tile caching service as an <a href="http://httpd.apache.org/">Apache</a> module. This skips a lot of overhead (no proxying, no interpreters, no CGI), and is very fast. In addition, the C implementation has excellent speed and performance. You can perform on the fly tile merging, quantization, and recompression. I&#8217;m excited about this module, and the promise of caching with an Apache server (looks like it has more features than <a href="http://wiki.openstreetmap.org/wiki/Mod_tile">mod_tile</a>).</p>
<h2>Geoserver</h2>
<p><a href="http://geoserver.org/display/GEOS/Welcome">Geoserver</a>&#8216;s next release is also going to include some great features. The ones that really jumped out at me:</p>
<ul>
<li>Time and elevation filters &#8212; e.g. storm tracking, where you can limit the features by a time field.</li>
<li>Styling SLDs in data units &#8212; e.g. &#8220;road is 5m wide&#8221;, and changes dynamically with scale. This greatly simplifies scale-dependent renderers.</li>
<li>Georeferencing of layers can be done in the admin interface.</li>
<li>Layers can be view definitions &#8212; you don&#8217;t have to roll your own views prior to creating the layer.</li>
<li>Virtual Services &#8212; partition the data layers by workspace.</li>
</ul>
<p>These aren&#8217;t all the new features; take a look at the <a href="http://geoserver.org/display/GEOS/State+of+GeoServer+2011">laundry list</a> yourself, and prepare to be impressed.</p>
<h2>Mapnik 2</h2>
<p>I think the reason for calling it <a href="http://trac.mapnik.org/">Mapnik2</a> is that it is literally twice as awesome as it was before. I learned about the new features in Mapnik2 in the lightning talks at SotM, and I think this was one of the few talks that made you feel like you were actually struck by lightning. I can&#8217;t remember half the slides in the talk, but the supported formats, reprojection, styling, and speed improvements left me with my head spinning.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2011/09/putting-the-fun-in-foss/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Building Districts in Web-Time</title>
		<link>http://www.azavea.com/blogs/labs/2011/08/building-districts-in-web-time/</link>
		<comments>http://www.azavea.com/blogs/labs/2011/08/building-districts-in-web-time/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 18:07:18 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[DistrictBuilder]]></category>
		<category><![CDATA[GeoServer]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[openlayers]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=1581</guid>
		<description><![CDATA[Most recently, the Politics, Redistricting and Elections team has been working closely with the Public Mapping Project to build DistrictBuilder, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you&#8217;ve seen how badly the professionals can mangle districts (Exhibit A, Exhibit B, etc), it&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.azavea.com/districtbuilder"><img class="alignleft size-full wp-image-1783" style="border: 0px;" title="DistrictBuilder_logo" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2011/07/DistrictBuilder_logo.png" alt="DistrictBuilder logo" width="206" height="66" align="left" hspace="4" vspace="4" />Most recently, the Politics, Redistricting and Elections team has been working closely with the <a title="The Public Mapping Project" href="http://www.publicmapping.org/">Public Mapping Project</a> to build <a title="District Builder" href="https://sourceforge.net/projects/publicmapping/">DistrictBuilder</a>, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you&#8217;ve seen how badly the professionals can mangle districts (<a title="Illinois Congressional District #4" href="http://www.redistrictingthenation.com/search.aspx?type=NATIONAL_LOWER&amp;id=4&amp;state=IL">Exhibit A</a>, <a title="Pennsylvanio Congressional District #12" href="http://www.redistrictingthenation.com/search.aspx?type=NATIONAL_LOWER&amp;id=12&amp;state=PA">Exhibit B</a>, <a title="Top 10 Congressional Districts" href="http://www.redistrictingthenation.com/top10.aspx">etc</a>), it&#8217;s easy to imagine that any given citizen, given the right tools, could do it better.</p>
<p>We spent quite a bit of time making the application easy to use and responsive in modern desktop web browsers.  The &#8220;easy to use&#8221; part was tackled by our excellent UI/UX design team. The &#8220;responsive&#8221; part was the domain of  our engineers.  That&#8217;s where the fun began for me.</p>
<p>DistrictBuilder is designed to use any polygon shapefile, transform it into an internal data model, then make that accessible via map tiles and geometric features.  When serving map tiles, we use <a title="Geoserver" href="http://geoserver.org/display/GEOS/Welcome">GeoServer</a> and <a title="GeoWebCache" href="http://geowebcache.org/">GeoWebCache</a> to generate the tiles and cache them, respectfully. This performance is great &#8212; pre-generated map tiles are the best we can aim for with respect to the base map tiles. Serving geometric features at full resolution, however, introduces a slew of problems. A few that stood out right away:</p>
<ul>
<li>Web Browser Limitations &#8212; 9 out of 10 experts agree: too many map features has a significant performance impact on web browsers, with the greatest impact on the Microsoft Internet Explorer browser.</li>
<li>Excessive Coordinates &#8212; delivering lots of polygon coordinate pairs that the user may never see consumes valuable bandwidth and rendering time.</li>
<li>Server Processing Time &#8212; recalculating state-wide geometric features consumes valuable CPU time.</li>
</ul>
<h1>Web Browser Limitations</h1>
<p>First, we tackled the browser performance issues. A sluggish browser is the kiss of death in the web world, and we had to make the application experience as fast as possible before looking at the server processing time.</p>
<p>We originally gave users the power to create highly detailed districts at the statewide level, but realized that no modern web browser could handle the volume of polygon features that would need to be served to represent an entire state.  In order to mitigate this limitation, we limited the size and number of features sent to the browser. With some scale-dependent logic, a user zoomed in to a detail of a district can finely tune the boundary by moving smaller geographic features (e.g. census blocks), and a user zoomed out to the state-wide level can manipulate the districts by moving large geographic features only (e.g. counties). In addition, when editing the finest details, we limit the number of features a user can move in a single edit.</p>
<h1>Excessive Coordinates</h1>
<p>The next thing to go was the set of full resolution geometries. In DistrictBuilder, users never actually see the full geometries, but an <a href="http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm">adaptively simplified</a> (sometimes called generalized) geometry; depending on the scale of the map view, the server will deliver geometries with appropriate coordinate resolutions. Simply put: as you zoom in on the map, you get more detail in the geometries.</p>
<p>By simplifying counties, the geometries are reduced from 166,958 points to 4,821. When a user is zoomed out, there is no noticeable difference between these geometries!  However, as the user is interacting with higher resolution maps, DistrictBuilder loads in higher-resolution geometries on demand. The following images demonstrate the difference in the geometry detail:</p>
<div id="attachment_1689" class="wp-caption aligncenter" style="width: 510px"><img class="size-full wp-image-1689 " title="Low Resolution Transition" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2011/08/county_vtd_transition-low1.png" alt="Low Resolution Transition" width="500" height="298" /><p class="wp-caption-text">The zoomed in County layer, with a low resolution district overlay (orange line). There are currently 1,414 coordinates in this view of the district overlay.</p></div>
<div id="attachment_1686" class="wp-caption aligncenter" style="width: 510px"><img class="size-full wp-image-1686  " title="High Resolution Transition" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2011/08/county_vtd_transition-high.png" alt="High Resolution Transition" width="500" height="298" /><p class="wp-caption-text">The zoomed in VTD layer, with a high resolution district overlay (orange line). There are currently 3,253 coordinates in this view of the district overlay.</p></div>
<p>You can notice the differences in the district detail if you look closely at the orange district boundary. This transition happens seamlessly in the application, loading in the higher resolution geometries as web users zoom in to areas of interest.</p>
<p>We also eliminated coordinates that you never see.  It made no sense to serve  coordinates that were located in the opposite side of the state where a user was editing, just like you wouldn&#8217;t expect to get an <a title="World Book (tm) Encyclopedia" href="http://www.worldbook.com/encyclopedias/201072_basic_research_package_2010--classic.html" class="broken_link" rel="nofollow">encyclopedia</a> in the mail when releasing an <a title="Request for Information" href="http://en.wikipedia.org/wiki/Request_for_information">RFI</a>. With the <a title="OpenLayers" href="http://www.openlayers.org/">OpenLayers</a> library, Strategies came in handy here, particularly <a href="http://dev.openlayers.org/apidocs/files/OpenLayers/Strategy/BBOX-js.html">BBOX</a>.</p>
<h1>Server Processing Time</h1>
<p>After we had optimized the performance of the user interface, we shifted our focus to the server-side processing.  One of the features that makes DistrictBuilder such a powerful tool is the accuracy of the underlying data and constant feedback of important district statistics. In order to calculate all these statistics on the fly, it is necessary to leverage some tricks already mentioned with respect to map tiles: caching and generalizing.</p>
<p>Computation of the district statistics must happen every time a district boundary is changed. A naive solution to this problem would be to aggregate the values within the boundary every time a change is made.  This approach results in horrible performance. Instead, we just determine what has changed &#8212; which areas were added, which areas were removed &#8212; and recompute the delta, or change, on the previous district value.</p>
<p>Another trick to optimizing performance is in the way we determine the changing boundaries.  I&#8217;ll describe the problem using the census geographies of counties, tracts, and blocks. The structure and detail of the underlying data yielded computationally expensive queries against the block geometries.  We came up with a method of searching for the geographies in a hierarchical fashion &#8212; searching the counties first, then continuing to the next smallest-scale geography only if there was any remaining geometry left in the query.  We did the same for the tracts, and took a shortcut at the block level to exclude the block geometries.  This increased server side performance considerably.</p>
<div id="attachment_1690" class="wp-caption aligncenter" style="width: 374px"><a href="http://www.azavea.com/blogs/labs/wp-content/uploads/2011/08/heirarchy-lg.png"><img class="size-full wp-image-1690" title="King William County" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2011/08/heirarchy-sm.png" alt="King William County" width="364" height="258" /></a><p class="wp-caption-text">King William county is comprised of 22 Voter Tabulation Districts and 1,527 Census Blocks.</p></div>
<p>Consider the following scenario: a user wants to move King William County (highlighted in yellow) from District 1, which is over populated, to District 3, which is under populated. Changing the boundaries with all the blocks in King William County would require testing at least 4,000 blocks for spatial intersections, then aggregating 1,527 data values, and recomputing the spatial aggregate (union) of those 1,527 geometries. With our hierarchical approach, we can change the boundary of the district with the county boundary, and change the population totals by the county&#8217;s population. A few orders of magnitude fewer operations to perform, and much faster from the user&#8217;s perspective.</p>
<h1>Lessons Learned</h1>
<p>Throughout the DistrictBuilder development process, the same core performance challenge has arisen: the volume of data must be reduced. This applies to all aspects of the application:</p>
<ul>
<li>Map Tiles: pre-render tiles to keep the number of rendered tiles to a minimum at runtime.</li>
<li>Map Features: deliver to the browser only as much information as you can see (perhaps even less).</li>
<li>Database Queries: do anything possible to ensure that geometric operations are performed on simplified geometries.</li>
<li>Aggregating Statistics: cache whatever you can, and only compute the difference from the last cache state.</li>
</ul>
<p>The above steps reduced the sheer number of operations and volume of processing that both the server and browser need to complete when creating new districts. These are lessons that translate well to <em>any</em> &#8220;big data&#8221; problem, and are crucial in bringing sophisticated GIS operations to the web.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2011/08/building-districts-in-web-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>OpenStreetMap, Cambridge Style</title>
		<link>http://www.azavea.com/blogs/labs/2010/11/openstreetmap-cambridge-style/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/11/openstreetmap-cambridge-style/#comments</comments>
		<pubDate>Thu, 11 Nov 2010 23:38:58 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[cambridge]]></category>
		<category><![CDATA[gps]]></category>
		<category><![CDATA[openstreetmap]]></category>
		<category><![CDATA[walking-papers]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=848</guid>
		<description><![CDATA[This past weekend I assisted in the organizing of an OpenStreetMap Mapping Party here in Cambridge. It was a diverse group of people, from all sorts of background, with various experience with mapping tools and GPS devices. Why a mapping party in Cambridge, MA? One may say that the area is pretty well covered. Well, [...]]]></description>
			<content:encoded><![CDATA[<p>This past weekend I assisted in the organizing of an OpenStreetMap Mapping Party here in Cambridge. It was a diverse group of people, from all sorts of background, with various experience with mapping tools and GPS devices.</p>
<div id="attachment_849" class="wp-caption aligncenter" style="width: 130px"><a href="http://www.openstreetmap.org/"><img class="size-full wp-image-849  " title="OpenStreetMap" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/11/Mag_map-120x120.png" alt="The Free Wiki World Map" width="120" height="120" /></a><p class="wp-caption-text">CC-SA 2.0 OSM</p></div>
<p>Why a mapping party in Cambridge, MA?  One may say that the area is <a href="http://www.openstreetmap.org/?lat=42.36209&amp;lon=-71.08763&amp;zoom=16&amp;layers=M">pretty well covered</a>. Well, yes. Most of the data in Massachusetts was imported from <a href="http://wiki.openstreetmap.org/wiki/MassGIS">MassGIS</a> data. Why? MassGIS states that the data collection was funded by taxpayers, and therefore lies in the <a href="http://en.wikipedia.org/wiki/Public_domain">public domain</a>.</p>
<p>This is a great start, but as we discovered throughout the day, that data was out of date, and in some cases, incorrect. Not like that was a surprise, though. <a href="http://www.openstreetmap.org/?lat=42.36209&amp;lon=-71.08763&amp;zoom=16&amp;layers=M">Buildings are built</a>, <a href="http://www.openstreetmap.org/?lat=42.361595&amp;lon=-71.082416&amp;zoom=18&amp;layers=M">roads are relocated</a>, and <a href="http://www.openstreetmap.org/?lat=42.36608&amp;lon=-71.094674&amp;zoom=18&amp;layers=M">sidewalks are shifted</a>. It&#8217;s just that it&#8217;s now up to us to keep it up to date.</p>
<p>In addition, we found out first-hand just how bad the &#8216;urban canyon&#8217; effect is. Relying only on our GPS devices would have been disasterous, as the tracks bounce all over the place when you are walking between large buildings.  It was impossible to map areas within these urban canyons relying on GPS reception alone. Fortunately, we were introduced to <a href="http://walking-papers.org/">walking papers</a> by Lars Ahlzen.</p>
<p>Walking Papers is a product of <a href="http://stamen.com/">Stamen Design</a>&#8216;s <a href="http://mike.teczno.com/">Michal Migurski</a>, and it&#8217;s a great low-tech solution for people who want to contribute to OpenStreetMap, but who don&#8217;t have a GPS, or don&#8217;t have the knack for gadgets (like many technophiles I know). These are paper maps for hand-annotating with pen or pencil. When you&#8217;re done, you can scan and upload your map back to walking papers for adding to OpenStreetMap (I&#8217;m almost there with my set in Cambridge, I&#8217;ll let you know how it goes).</p>
<p>Altogether, it was a fun afternoon, with warm beverages and snacks provided by Azavea. I hope we got some folks excited about OpenStreetMap, and how easy it is to improve our maps.  Next time, instead of the city, I think I&#8217;ll  map some trails in the woods around New England.</p>
<p><strong>Update</strong>:</p>
<p>I <a href="http://www.walking-papers.org/print.php?id=87dfdm5x">printed</a> my walking papers of the Kendall Square area, near our Cambridge office, and did some quick surveying on my way home last night. This morning, I tried scanning and uploading the image again.  I found that the uploader was finicky, and it seemed to work when:</p>
<ol>
<li>File format is JPG (TIFF was a no-go)</li>
<li>Scanned images were scanned as full-color photos, and not black &amp; white photos or text</li>
<li>Scanned image resolution was 300dpi</li>
</ol>
<p>When I figured all that out, I was able to get my <a href="http://www.walking-papers.org/scan.php?id=th5nvwpb">scanned print</a> into walking-papers, and even added the walking-papers <a href="http://wiki.openstreetmap.org/wiki/JOSM/Plugins/WalkingPapers">plugin</a> into <a href="http://josm.openstreetmap.de/">JOSM</a>. I found that the tiles of my scanned print sometimes didn&#8217;t load in properly, but that may be due to operator error (impatience).</p>
<p>All in all, I think walking-papers is a really great addition to the OpenStreetMap ecosystem. Now you have one less reason to not be mapping your own town!</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">http://www.openstreetmap.org/?lat=42.36209&amp;lon=-71.08763&amp;zoom=16&amp;layers=M</div>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/11/openstreetmap-cambridge-style/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GPU Occupancy and Idling</title>
		<link>http://www.azavea.com/blogs/labs/2010/07/gpu-occupancy-and-idling/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/07/gpu-occupancy-and-idling/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 14:05:03 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[raster]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=590</guid>
		<description><![CDATA[As our ongoing research into raster processing for GIS on the GPU progresses, we have gone through various stages in the development of each Map Algebra operation.  Having converted a given operation to the GPU, we are finding that there are many potential ways to optimize, and this optimization process brings with it a host [...]]]></description>
			<content:encoded><![CDATA[<p>As our ongoing research into raster processing for GIS on the GPU progresses, we have gone through various stages in the development of each Map Algebra operation.  Having converted a given operation to the GPU, we are finding that there are many potential ways to optimize, and this optimization process brings with it a host of issues that highlight the differences between sequential CPU programming and GPGPU parallel programming.</p>
<p>During the optimization process, we&#8217;ve found (and been told) that the single most important optimization is to ensure memory coalescence.  I blogged about that <a href="http://www.azavea.com/blogs/labs/2010/06/gpu-memory-bandwidth/">before</a>, so if you haven&#8217;t seen it yet, it might be worth reading before you continue on.</p>
<p>After maximum memory coalescence has been achieved, it is possible to focus on 2 additional metrics: occupancy and idling.</p>
<h2>Occupancy</h2>
<p>The occupancy metric is defined as the number of active thread groups per processor divided by the maximum number of thread groups per processor.  It&#8217;s a value in the range of 0-100%.</p>
<p>Occupancy is the number of thread groups (NVidia calls them &#8216;warps&#8217;, ATI calls them &#8216;wavefronts&#8217;) that are active at one time.  At any one time, some thread groups may be processing data, and some thread groups may be accessing global memory.  When some thread groups are accessing global memory, these threads are effectively stalled for hundreds of instructions, while the other thread groups continue on.</p>
<p>Internally, the GPU has a thread group scheduler which controls when thread groups are executed. This is extremely useful, since highly parallel operations will utilize many thread groups to perform calculations. The GPU is highly parallel, but even it has its limits. This is where the thread group scheduler comes in &#8212; it can execute some of the thread groups, while other thread groups are idle, either completed or queued. This scheduling enables some thread groups to perform memory access, while other thread groups perform calculations.</p>
<p>Understanding the scheduler makes it possible to &#8216;hide&#8217; these global memory accesses by performing ~100 arithmetic instructions between each global memory access.  Hypothetically, if the GPU executed a kernel that accessed global memory, performed a heavy-duty calculation, then saved that result, the occupancy would probably be pretty high. The thread group scheduler would schedule a set of thread groups for accessing global memory while scheduling another set of thread groups for heavy-duty calculation. This is effectively &#8216;hiding&#8217; the memory access, since the GPU can perform computation instructions while accessing memory. Interestingly, there will be a point when increases to occupancy won&#8217;t improve your performance. It is at this point when all global memory accesses are &#8216;hidden&#8217; by the computation, and it becomes time to look other places for optimization.</p>
<h2>Idling</h2>
<p>The idling metric is defined as the amount of time the GPU is idle divided by the overall execution time of the computation.  It&#8217;s a value in the range of 0-100%.</p>
<p>Idling is something that we have discovered to be critical to the performance of a calculation.  The reference and training documentation instructs GPGPU developers to keep the GPU as busy as possible for as long as possible, and stops there.  By creating this metric, we were able to measure just how much this idling was affecting our computation.</p>
<p>As it turns out, our initial experiments showed that our GPU was idle during periods of memory transfer to and from the CPU.  This idling of the GPU was extending the overall time for computation.  Minimizing this idling through asynchronous kernel execution and memory transfer resulted in a significant and immediate performance improvement.</p>
<h2>Coalescence, Occupancy, Idling</h2>
<p>To summarize, the best way to optimize your GPU computations is to investigate and optimize these three steps (and in this order):</p>
<ol>
<li>Memory coalescence</li>
<li>Thread group occupancy</li>
<li>GPU Idling</li>
</ol>
<p>There are a number of smaller optimization that can be done as well, but we&#8217;ve found these to be the big 3.  Of course, you can continue this process forever, and demonstrate to your boss the law of <a href="http://www.google.com/images?hl=en&amp;q=diminishing+returns+graph">diminishing returns</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/07/gpu-occupancy-and-idling/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GPU Memory Bandwidth and Coalescing</title>
		<link>http://www.azavea.com/blogs/labs/2010/07/gpu-memory-bandwidth/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/07/gpu-memory-bandwidth/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 09:40:54 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=579</guid>
		<description><![CDATA[When one begins to work with GPGPU, the parallel processing benefits can be incredibly beneficial, if you know how to work with coalesced memory. This fits in with a parallel algorithm approach, incorporating the following: thinking about your computation in a data-parallel fashion. transferring working data into a local memory cache. considering scrutinizing how your [...]]]></description>
			<content:encoded><![CDATA[<p>When one begins to work with <a href="http://www.azavea.com/blogs/labs/2010/06/what-the-heck-is-gpgpu/">GPGPU</a>, the parallel processing benefits can be incredibly beneficial, if you know how to work with coalesced memory. This fits in with a parallel algorithm approach, incorporating the following:</p>
<ol>
<li>thinking about your computation in a data-parallel fashion.</li>
<li>transferring working data into a local memory cache.</li>
<li><span style="text-decoration: line-through;">considering</span> scrutinizing how your code performs global memory accesses.</li>
</ol>
<p>The first item almost goes without saying.  If you are hoping to leverage a massively parallel computing device, you obviously have to break your problem or computation down into discrete units that can be operated on in parallel.</p>
<p>It&#8217;s the second and third point that I am going to focus on in this post, since they are the most important factors when optimizing your GPGPU code.  The reason these are the most important factors are that local memory is so much faster at reading and writing than global memory, and the memory module in modern GPUs can perform concurrent reads to sequential global memory positions for an entire thread group.</p>
<h2>Local Memory Caching</h2>
<p>Use of a local memory cache may seem counter-intuitive to a programmer coming from CPU land.  The best analogy would be: storing your working data in RAM instead of on disk.  While not a perfect analogy, a CPU programmer understands perfectly the ramifications of such a design decision &#8212; any data accessed from disk will be retrieved more slowly than data accessed from RAM.  Likewise for local and global memory.  Local memory is on-chip memory that is exceptionally fast.  Global memory is off-chip memory that is often used to transfer data to/from the host (often the CPU).  I&#8217;m talking about a 100x speed difference when using local memory instead of global memory.</p>
<p>In addition to the differences in global and local memory, the memory bandwidth to/from the graphics card (which contains its own memory and processors) and the motherboard (which contains RAM and one or more CPUs) is another bottleneck.  Data transfer rates across the <a href="http://www.pcisig.com/specifications/pciexpress/specifications">PCI Express 2.0</a> bus are about 8 GB/s.  Data transfer rates in the graphics card are around 141 GB/s.  So not only is the place in which you store your working data important, but also when and how you transfer that data to/from the GPU device itself.</p>
<h2>Sequential Global Memory a.k.a. Coalescence</h2>
<p>And &#8220;sequential global memory positions&#8221;? What is that?  Inside a GPGPU kernel, when accessing a portion of global memory, all threads in that group (NVidia calls them &#8216;warps&#8217;, and ATI calls them &#8216;wavefronts&#8217;) access a bank of memory at one time.  For example, if there are 16 threads executing with the same kernel, 16 sequential positions in global memory (1 position per thread) can be accessed in the same time that it would take 1 thread to read 1 position in memory.  If all memory accesses are performed this way, performance can speed up by a factor of 16 (in the memory access code).</p>
<p>That&#8217;s a wonderful way to speed up data-intensive operations, especially when one is working with raster data, and a given block of cells is accessed multiple times.  It is in this scenario that our research has recently landed us.</p>
<p>Another thing worth noting is that coalescence concept applies to global memory on the GPU only &#8212; local memory does not suffer the same performance hit, so does not need to take advantage of this technique.  But global memory access on the GPU takes about 100x as many instructions as local memory access.  This means that if you have coalesced global memory access, you are saving hundreds of instructions per thread.  This starts to add up when you consider that processing a raster may require hundreds or thousands of threads.</p>
<p>Armed with this knowledge, parallel algorithm implementations begin to have similar structures with regards to memory access.  The resulting code can be highly complex, though, and it&#8217;s not trivial to debug, but some new tools from <a href="http://developer.nvidia.com/object/visual-profiler.html">NVidia</a> and <a href="http://developer.amd.com/gpu/StreamProfiler/Pages/default.aspx" class="broken_link" rel="nofollow">ATI</a> are enabling developers to profile and visualize the work performed by the GPU. In my next post, I&#8217;ll discuss latency and occupancy, two metrics that one can use to help optimize GPU kernels.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/07/gpu-memory-bandwidth/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GPUs and Parallel Computing Architectures</title>
		<link>http://www.azavea.com/blogs/labs/2010/06/parallel-computing-architectures/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/06/parallel-computing-architectures/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 14:45:14 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=565</guid>
		<description><![CDATA[I&#8217;ve been blogging about GPUs recently, and I think you can tell it&#8217;s because I&#8217;m excited about the technology.  General Purpose Computing on the GPU (GPGPU) promises great performance increases in computationally heavy software, which we find immensely useful.  In the past, we&#8217;ve managed to engineer web-based applications (see: SmartConservation) that could run complex models [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been blogging about GPUs recently, and I think you can tell it&#8217;s because I&#8217;m excited about the technology.  <strong>G</strong>eneral <strong>P</strong>urpose Computing on the <strong>GPU</strong> (<a href="http://www.azavea.com/blogs/labs/2010/06/what-the-heck-is-gpgpu/">GPGPU</a>) promises great performance increases in computationally heavy software, which we find immensely useful.  In the past, we&#8217;ve managed to engineer web-based applications (see: <a href="http://www.azavea.com/clients/smartconservationgreenplan.aspx">SmartConservation</a>) that could run complex models by implementing a process queuing architecture, but in these systems, while they will run on the web, processing may still take several minutes and they therefore can neither provide a responsive user experience nor support many users.  We&#8217;ve also engineered a system that can perform fast, distributed raster calculations (see: <a href="http://www.walkshed.org/">Walkshed</a>, powered by <a href="http://www.azavea.com/Products/DecisionTree/Home.aspx">DecisionTree</a>).</p>
<p>One of the reasons that GPGPU is so promising is the increasing number of processing cores available on affordable graphics cards.  This increases the computation capacity by leveraging many processors running in parallel. What&#8217;s interesting is that this technique is not new.  <a href="http://blogs.intel.com/research/authors#timothy_mattson">Timothy Mattson</a>, blogging at <a href="http://blogs.intel.com/research/">Intel</a>, has been doing this since the mid 80&#8242;s.  The Library of Congress contains a book on <a href="http://lccn.loc.gov/72133318">parallel computing structures and algorithms</a>, dating back to 1969.</p>
<p>As we delve deeper into our work improving Map Algebra operations, important differences in algorithm approaches and implementations become apparent: not all parallel architectures are the same.  One might be tempted to think that when switching from the single-threaded CPU logic to multithreaded/parallel logic that there would be one model of parallel computing that is universal.  This is definitely not the case.</p>
<p>Three of the most popular types of parallel computing today are:</p>
<ul>
<li><strong>S</strong>hared-memory <strong>M</strong>ulti<strong>-P</strong>rocessors (SMP)</li>
<li>Distributed-memory <strong>M</strong>assive <strong>P</strong>arallel <strong>P</strong>rocessors (MPP)</li>
<li>Cluster computing</li>
</ul>
<p>Each type of parallel computing has its benefits and drawbacks.  It really just depends what kind of computing you need to do. I&#8217;ll describe these common computing types in detail, starting with the &#8216;traditional&#8217; CPU model.</p>
<p><span id="more-565"></span></p>
<h2>General Purpose</h2>
<p>The &#8216;traditional&#8217; processors used in many computers up until a few years ago were single core processors, called the <strong>C</strong>entral <strong>P</strong>rocessing <strong>U</strong>nit (CPU).  The CPU was able to access a large, general-purpose memory bank, called <strong>R</strong>andom <strong>A</strong>ccess <strong>M</strong>emory (RAM).</p>
<div id="attachment_790" class="wp-caption aligncenter" style="width: 485px"><img class="size-full wp-image-790" title="CPU Architecture" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/cpu-drawing.png" alt="Simplified CPU Architecture" width="475" height="54" /><p class="wp-caption-text">Simplified CPU Architecture</p></div>
<p>In contemporary computers, the CPU often contains more than one core, making CPU&#8217;s capable of more than one instruction at a time.  In addition to superscalar instruction processing, this makes modern CPUs much faster than their single core, scalar predecessors.</p>
<h2>Shared-memory Multiprocessors</h2>
<p>An SMP architecture is probably the one parallel computing architecture that is most like the general purpose architecture with which we are familiar. SMP are a set of processors that all have their own local memory.  These memory banks are shared within a thread group, but not between more than one thread group.  However, each processor also has access to a global memory bank, which is shared between all processors.</p>
<div id="attachment_791" class="wp-caption aligncenter" style="width: 485px"><img class="size-full wp-image-791" title="SMP" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/simd-drawing.png" alt="Simplified Shared-memory Multiprocessor Architecture" width="475" height="256" /><p class="wp-caption-text">Simplified Shared-memory Multiprocessor Architecture</p></div>
<p>This is the parallel architecture that NVidia and AMD/ATI both use in their GPUs.  Likewise, it&#8217;s also the model enforced in the OpenCL specification.</p>
<h2>Distributed-memory Massive Parallel Processors</h2>
<p>The most complicated and flexible architecture type is MPP.  MPP systems isolate memory and processors together, and as such, have no common or shared memory.  Each processor has a dedicated block of local memory, and communicates with other processors via a bus or network.  By varying the number of processors each processor is connected to, different types of MPP systems can be created.  For example:</p>
<ul>
<li>Linear array: if the processors were arranged in a line, each processor is connected to 2 neighboring processors
<p><div id="attachment_793" class="wp-caption aligncenter" style="width: 385px"><img class="size-full wp-image-793" title="Linear Array Architecture" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/linear-array.png" alt="Simplified Linear Array Architecture" width="375" height="53" /><p class="wp-caption-text">Simplified Linear Array Architecture</p></div></li>
<li>Linear ring: if the processors were arranged in a circle, each processor is connected to 2 neighboring processors (a linear array, with the ends connected)</li>
<li>Mesh: if the processors were arranged in a grid, each processor is connected to up to 4 neighbors (3 on the edges, and 2 in the corners)
<p><div id="attachment_794" class="wp-caption aligncenter" style="width: 385px"><img class="size-full wp-image-794" title="Mesh Architecture" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/mesh.png" alt="Simplified Mesh Architecture" width="375" height="130" /><p class="wp-caption-text">Simplified Mesh Architecture</p></div></li>
<li>Tree: if the processors were arranged in a hierarchical manner, with each processor connected to the processor above it, and two processors below it.</li>
<li>Pyramid: if the processors were arranged similar to a tree, but in three dimensions, with each processor connecting to four processors below it.
<p><div id="attachment_795" class="wp-caption aligncenter" style="width: 385px"><img class="size-full wp-image-795" title="Pyramid Architecture" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/pyramid.png" alt="Simplified Pyramid Architecture" width="375" height="202" /><p class="wp-caption-text">Simplified Pyramid Architecture</p></div></li>
<li>Cube: if the processors were arranged similar to a mesh, but in three dimensions, with each processor connected to up to 6 neighbors.</li>
<li>Hypercube: if the processors were arranged similar to a cube, but in four dimensions, with each processor connected to up to 8 neighbors.</li>
</ul>
<p>As you can see, the processors in MPP systems can proliferate quite rapidly with more complex processor network topologies.   We haven&#8217;t worked with any MPP systems for our GPU research, so I&#8217;ll let you ponder that while I return to the GPU architecture.</p>
<h2>GPU Memory &#8211; Not Your Father&#8217;s RAM</h2>
<p>As I mentioned above, GPUs and OpenCL implementations are based on the SMP architecture.  As such, GPUs have multiple types of memory, with different implications for each type.</p>
<ul>
<li><strong>Global memory: </strong>this is often the big number on the graphics card packaging.  512 MB DDR, etc.  This is the amount of global memory that is available to the GPU processors.  This memory is essentially used as a fast cache to the motherboard RAM, since it&#8217;s used to transfer raw data to the GPU for processing, and storing computation results prior to reading out back to motherboard RAM.</li>
<li><strong>Local shared memory:</strong> this is a much smaller bank of memory that is extremely fast.  On the hardware that we&#8217;re using, it&#8217;s limited to 16KB. With some smart memory management, this local memory can really speed up computations, since the instruction cost of accessing this memory is 1% of that required to access global memory.  Also, this memory is shared between all threads in a work-group.</li>
<li><strong>Private thread memory: </strong>this is an extremely small bank of memory that can be used within each thread for variables and temporary storage during your computation. Interestingly, in the NVidia implementation, this uses registers for a certain amount, then starts using global memory when registers are exhausted.</li>
</ul>
<p>The differences in memory types are probably the first thing a general purpose GPU programmer will run into. Another thing to keep in mind is the method by which the GPU achieves such high throughput, and that&#8217;s thread parallelism.</p>
<h2>Single Instruction Multiple Threads</h2>
<p>In OpenCL, each parallel code path executes one kernel.  The best possible outcome (in regards to thread synchronicity) is when each kernel executes the exact same instructions as all other threads.  With each thread managing a different nugget of data, this results in extremely fast execution.  However, if the kernel code diverges or branches, there is a performance penalty: that section of your code will execute serially (think 16x to 32x slower).</p>
<p>NVidia implements this architecture, and has called it <strong>S</strong>ingle <strong>I</strong>nstruction <strong>M</strong>ultiple <strong>T</strong>hreads (SIMT). It&#8217;s kind of like <a href="http://en.wikipedia.org/wiki/Line_dance">line-dancing</a> for threads.  All threads that execute the same instructions can perform together.  If a thread diverges or branches, then the line-dance is broken, and each thread processes a divergent section one after another.  What&#8217;s kind of cool, though, is that the threads will join back up after diverging, and continue on together.</p>
<h2>Wrapping it up</h2>
<p>With a solid understanding of how the GPU operates in addition to the limitations of memory and threading, it&#8217;s relatively easy to start computing on the GPU.  Many common operations are easily parallelizable, such as sorting and basic mathematical operations.  When you start performing serious number crunching, or if you are porting a beefy algorithm from serial CPU code, that&#8217;s when the real fun begins.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/06/parallel-computing-architectures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CUDA, Stream, and OpenCL</title>
		<link>http://www.azavea.com/blogs/labs/2010/06/cuda-stream-and-opencl/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/06/cuda-stream-and-opencl/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 14:30:27 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[opencl]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=558</guid>
		<description><![CDATA[Computing on the GPU, or GPGPU, is a steadily maturing technology.  There are many technologies out in the wild that will enable you to use GPU&#8217;s for computation, but there&#8217;s a catch: the vendors are still vying for the lead.  The two market leaders are currently NVidia and AMD/ATI. That means that NVidia is pushing [...]]]></description>
			<content:encoded><![CDATA[<p>Computing on the GPU, or <a href="http://www.azavea.com/blogs/labs/2010/06/what-the-heck-is-gpgpu/">GPGPU</a>, is a steadily maturing technology.  There are many technologies out in the wild that will enable you to use GPU&#8217;s for computation, but there&#8217;s a catch: the vendors are still vying for the lead.  The two market leaders are currently <a href="http://www.nvidia.com/">NVidia</a> and <a href="http://ati.amd.com/">AMD/ATI</a>.</p>
<p>That means that NVidia is pushing their GPGPU API, which is named <strong>C</strong>ompute <strong>U</strong>nified <strong>D</strong>evice <strong>A</strong>rchitecture,&#8221; or <a href="http://www.nvidia.com/object/cuda_what_is_new.html">CUDA</a>.  Their rival, AMD/ATI, is pushing <a href="http://www.amd.com/US/PRODUCTS/TECHNOLOGIES/STREAM-TECHNOLOGY/Pages/stream-technology.aspx">Stream</a>. Stream incorporates <a href="http://graphics.stanford.edu/projects/brookgpu/">BrookGPU</a>, a compiler and data-parallel language developed at Stanford University, which predates CUDA.</p>
<div id="attachment_705" class="wp-caption aligncenter" style="width: 335px"><img class="size-full wp-image-705   " style="border: 5px solid white;" title="GPGPU APIs" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/apis.png" alt="NVidia &amp; CUDA, ATI &amp; Stream, or OpenCL" width="325" height="100" /><p class="wp-caption-text">NVidia &amp; CUDA, ATI &amp; Stream, or OpenCL</p></div>
<p>Both of these vendor APIs are proprietary, and run on each vendor&#8217;s specific hardware.  This makes sense if a developer can control what hardware computations will be using. Realistically, a developer rarely has such control. So what are the options? At the current time, there are only a couple:  <a href="http://www.khronos.org/opencl/">OpenCL</a> and Microsoft’s <a href="http://en.wikipedia.org/wiki/DirectCompute">DirectCompute</a> technology.  Microsoft&#8217;s technology is limited to Windows Vista and Windows 7, though, so we are focusing on OpenCL.</p>
<p>OpenCL is the <strong>Open</strong> <strong>C</strong>omputing <strong>L</strong>anguage, a language that extends the <a href="http://en.wikipedia.org/wiki/C99">C99</a> standard (a modern dialect of the C programming language) and compiles into device-specific binaries. OpenCL was originally developed by <a href="http://developer.apple.com/technologies/mac/snowleopard/opencl.html">Apple</a>, and handed over to the <a href="http://www.khronos.org/">Khronos Group</a>. The OpenCL standard was ratified by the consortium in December of 2008.  The Khronos Group consortium includes all the major players in the field, including NVidia, AMD/ATI, Apple, and <a href="http://www.intel.com/">Intel</a>.  The list is much more extensive, but those are the four to be happy about.  Intel doesn&#8217;t support OpenCL in their multicore CPUs, but I&#8217;m optimistic that they will release an OpenCL API to leverage CPU cores as well as GPU cores as computing devices.</p>
<p>OpenCL was created to address the <a href="http://www.youtube.com/watch?v=ZwNWviK5z0Q">need for speed</a> in current desktop systems that contain GPU processors.  The language was created to address computing on heterogeneous systems, which, when you think about it, can include many other types of computing devices.  If OpenCL is adopted by <a href="http://developer.android.com/index.html">Android</a>, then you could optimize code to run on Android devices, too. While this may not be the fastest approach, it would potentially let you distribute work among devices.</p>
<p>One caveat to heterogeneous systems, though: OpenCL kernels that are written and optimized for one hardware platform probably won&#8217;t perform the same as on another hardware platform.  While OpenCL enables developers to write code that can run on multiple hardware devices, the hardware implementations may vary.  For example, the number of processor cores, and thus the number of parallel threads may vary widely.</p>
<p>If you can&#8217;t tell already, we are sold on the promise of OpenCL for GPGPU.  The language is easy to use (if you already know C), and it supports the two biggest players in the GPU market, NVidia and AMD/ATI.  We are hoping that Intel releases their OpenCL drivers for CPUs, too, so that we can squeeze out the last drip of computing power for our computations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/06/cuda-stream-and-opencl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What the heck is &#8230; GPGPU?</title>
		<link>http://www.azavea.com/blogs/labs/2010/06/what-the-heck-is-gpgpu/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/06/what-the-heck-is-gpgpu/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 16:00:48 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[ati]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[nvidia]]></category>
		<category><![CDATA[parallel]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=553</guid>
		<description><![CDATA[General Purpose Computing on the Graphics Processing Unit, or GPGPU, is a technology that enables software to use the computing power of the multiple processing units that come with modern graphics cards.  The benefits of using these processing units is that computational work gets done by many workers at once, instead of one CPU or [...]]]></description>
			<content:encoded><![CDATA[<p>General Purpose Computing on the Graphics Processing Unit, or GPGPU, is a technology that enables software to use the computing power of the multiple processing units that come with modern graphics cards.  The benefits of using these processing units is that computational work gets done by many workers at once, instead of one CPU or a few threads on a multi-core CPU.</p>
<p>At first, it may sound like multi-threading, and in a way it is.  What&#8217;s key, though, is that the GPU doesn&#8217;t just multi-thread, it&#8217;s synchonized, lock stepped multi-threading.   It&#8217;s called Single Instruction, Multiple Data, or <a href="http://en.wikipedia.org/wiki/SIMD">SIMD</a> in <a href="http://en.wikipedia.org/wiki/Flynn%27s_taxonomy">Flynn&#8217;s Taxonomy</a> (NVidia&#8217;s hardware uses a variant called Single Instruction, Multiple Threads, or SIMT).  This refers to the control flow in each thread on the GPU &#8212; to maximize performance, it&#8217;s important to get all threads running through the same logic at the same time.  In practice, some threads will eventually branch, and keeping the branching to a minimum is critical.</p>
<p>When we have processing tasks that can be highly discrete, the SIMT nature of the GPU enables computations to really fly when using GPGPU.  Structuring computations in this fashion will result in operations that are highly parallelized.  Coincidentally, they will also be easily distributed to other computation engines.  But unlike other distributed computing methods, GPGPU is tightly coupled, and has very low latency &#8212; you get immediate results.</p>
<p><img class="size-full wp-image-606 alignnone" style="margin-left: 150px; margin-right: 150px;" title="OpenCL Logo" src="http://www.azavea.com/blogs/labs/wp-content/uploads/2010/06/OpenCL_Logo_RGB_172_square.png" alt="OpenCL Logo" width="172" height="172" /></p>
<p>To facilitate all of above, GPGPU requires a video card that supports programmability.  While the major GPU manufacturers have released proprietary GPGPU languages &#8212; for example is NVidia’s CUDA framework &#8212; a cross-platform, device-independent alternative, called <a href="http://www.khronos.org/opencl/">OpenCL</a>, has recently become sufficiently robust that we can begin to use it for implementing GPGPU processing.  While originally developed by Apple, OpenCL is now overseen by a non-profit organization, the <a href="http://www.khronos.org/">Khronos Group</a>.</p>
<p>If you&#8217;ve got a GPU on hand, it&#8217;s quite likely that there is an OpenCL API available in your programming language of choice.  Here&#8217;s a short list from the <a href="http://www.khronos.org/developers/resources/opencl/">Khronos Group&#8217;s</a> website:</p>
<ul>
<li><a href="http://mathema.tician.de/software/pyopencl">Python</a></li>
<li><a href="http://ruby-opencl.rubyforge.org/">Ruby</a></li>
<li><a href="http://www.opentk.com/project/opentk">C</a></li>
<li><a href="http://www.opentk.com/project/opentk">C++</a></li>
<li><a href="http://www.opentk.com/project/opentk">C#</a></li>
<li><a href="http://www.opentk.com/project/opentk">VB.Net</a></li>
<li><a href="http://planet.plt-scheme.org/display.ss?package=opencl.plt&amp;owner=jaymccarthy">Scheme</a></li>
</ul>
<p>By the way, Khronos just released the spec for <a href="http://www.khronos.org/news/press/releases/khronos-group-releases-opencl-1-1-parallel-computing-standard">OpenCL 1.1</a> on June 14th, 2010.</p>
<h2>GPGPU for GIS</h2>
<p>Our interest in GPGPU is motivated by our desire to speed up complicated raster math calculations, in particular those that would enable us to make more sophisticated geoprocessing available on web and mobile platforms.  Our vision is to make an order of magnitude or better improvements in geospatial data processing and thereby make web-based analysis tools more responsive and compelling.  To support our research we applied for and were awarded a <a href="http://www.nsf.gov/">National Science Foundation</a> (NSF) <a href="http://www.nsf.gov/eng/iip/sbir/">Small Business Innovation Research</a> (SBIR) grant to benchmark some traditional GIS operations against  GPU-accelerated versions of the same operations.  Our objective is to improve these operations by about 20x, and it looks promising.</p>
<p>We&#8217;re sampling the wide array of <a href="http://www.azavea.com/blogs/newsletter/v3i1/what-the-heck-ismap-algebra/">Map Algebra</a> operations that are in standard GIS toolkits, focusing on a couple examples of each of the following types of operations: <a href="http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Operators_and_functions_of_Spatial_Analyst">local, focal, zonal, and global</a>.  Some of these types of operations are easily parallelized, such as local and focal operations.  The other operations require a fair amount of algorithm wrangling before they show any improvement on the GPU&#8217;s parallel architecture.</p>
<p>Our research is ongoing, so it&#8217;s still a bit too early to tell, but we&#8217;re excited to already see some impressive performance improvements.</p>
<h2>GPGPU Lessons</h2>
<p>Migrating our chosen Map Algebra operations to the GPU was trivial in some cases, but quite challenging in others.  For example, local Map Algebra operations are quite easy to parallelize.  Each raster cell is compared to another raster cell in the same location.  This requires very little extra memory and few intermediate steps.</p>
<p>Focal (or &#8220;neighborhood&#8221;) operations are not simple, particularly for large neighborhoods, but with some intelligent partitioning, it&#8217;s relatively easy to work with different sections of the rasters.  This requires some intermediate buffers, but performs well using local memory on the GPU.</p>
<p>Zonal operations summarize the cells in one raster data set using zones identified in a second one.  These operations require a couple scans across a raster, and a few intermediate stages &#8212; that one was complicated.</p>
<p>What&#8217;s crazy difficult, though, are the global operations.  These operations are challenging because they operate on the entire dataset, with one cell on one side of the raster potentially affecting a cell on the other side of the raster. Examples of global operations include Euclidean distance, cost-weighted distance, viewshed analysis and others.  These operations are non-trivial to convert to GPU processing in a performant way, but it is definitely possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/06/what-the-heck-is-gpgpu/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>GPU Computing for GIS</title>
		<link>http://www.azavea.com/blogs/labs/2010/06/gpu-computing-for-gis/</link>
		<comments>http://www.azavea.com/blogs/labs/2010/06/gpu-computing-for-gis/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 15:01:17 +0000</pubDate>
		<dc:creator>David Zwarg</dc:creator>
				<category><![CDATA[Posts]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[gpgpu]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.azavea.com/blogs/labs/?p=543</guid>
		<description><![CDATA[We live in exciting times. Computing power continues to grow at an exponential rate, and is well characterized by Moore&#8217;s Law (if you are looking for a graph more recent than 1965, try Wikipedia).  This means that computing power is moving in many directions.  The rise of laptops, notebooks, tablets, and smartphones are a testament [...]]]></description>
			<content:encoded><![CDATA[<p>We live in exciting times.</p>
<p>Computing power continues to grow at an exponential rate, and is well characterized by <a href="http://www.intel.com/pressroom/kits/events/moores_law_40th/index.htm?iid=tech_mooreslaw+body_presskit">Moore&#8217;s Law</a> (if you are looking for a graph more recent than 1965, try <a href="http://en.wikipedia.org/wiki/Moore%27s_law">Wikipedia</a>).  This means that computing power is moving in many directions.  The rise of <a href="http://tuxmobil.org/">laptops</a>, <a href="http://www.datamancer.net/steampunklaptop/steampunklaptop.htm">notebooks</a>, <a href="http://thefastertimes.com/nottruenews/2010/01/26/apple-fans-report-disappointment-with-new-tablet-computer/">tablets</a>, and <a href="http://www.openmoko.com/freerunner.html">smartphones</a> are a testament to the increasing computing power of microprocessors.  They are getting faster, smaller, lighter, more power efficient, and sprouting more cores.</p>
<p>Despite this accelerating computing power, however, on some of our projects, we&#8217;ve seen how many <a href="http://www.azavea.com/Clients/smartconservationgreenplan.aspx">heavy-duty</a> analytical computing tasks remain too costly (in terms of computing time) to be run on the web with more than a small number of users.  However, by distributing the computation across multiple processors and machines, we have found it is possible to improve both the scalability and speed of some geographic data processing tasks.  For one such task, a weighted raster overlay operationg, we have been able to accelerate the process enough to make a scalable web application possible.  Azavea’s <a title="DecisionTree" href="http://www.azavea.com/decisiontree/">DecisionTree</a> framework, developed with support from an <a href="http://www.azavea.com/blogs/newsletter/v2i6/what-the-heck-isan-sbir/">SBIR</a> grant from the <a href="http://www.csrees.usda.gov/funding/sbir/sbir.html">US Department of Agriculture</a>.</p>
<p>With this experience developing distributing geoprocessing algorithms, we have recently been taking a look at technologies that will enable us to make similar types of performance and scalability improvements.  One technology that we believe has great promise for bringing these processes to the web is General Purpose Computing on the Graphics Processing Unit (<a href="http://en.wikipedia.org/wiki/GPGPU">GPGPU</a>).</p>
<p>GPGPU leverages the microprocessors that power many modern graphics cards.  <a href="http://www.nvidia.com/object/what_is_cuda_new.html">NVidia</a> and <a href="http://www.amd.com/US/PRODUCTS/TECHNOLOGIES/STREAM-TECHNOLOGY/Pages/stream-technology.aspx">ATI</a> are the largest players in the high performance video adapter field, and they both have GPU computing libraries that run on their video adapter hardware.</p>
<h2>GPU&#8217;s are accelerating everything.</h2>
<p>GPU&#8217;s are powerful for general purpose computing not just because of their clock speed, but because there are just so many multiprocessors on today&#8217;s GPU graphics cards.  While a quad-core CPU is a high-end processor for most servers, today&#8217;s high-end graphics cards have 100, 200 and 500 or more cores and are capable of <a href="http://en.wikipedia.org/wiki/FLOPS" target="_blank">gigaFLOPS</a> double precision processing power (<a href="http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html" target="_blank">NVidia</a>, <a href="http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5970/Pages/ati-radeon-hd-5970-overview.aspx" target="_blank">ATI</a>, respectively).  And these numbers are doing nothing but going up.</p>
<p>A few ways of comparing just what that means:</p>
<ul>
<li>a handheld calculator runs at about 10 FLOPS (not giga-, just plain FLOPS, one billionth of a gigaFLOP).</li>
<li>by the time you blink your eye, 154 gigaFLOPS have occurred on the NVidia Tesla C2070.</li>
<li>by the time a hummingbird flaps it&#8217;s wings, 10.3 gigaFLOPS have occurred on the same card.</li>
<li>by the time one FLOP has occurred on the same card, your voice has only traveled through 0.64 μm of air (human hair ranges from 17-181 μm thick)</li>
</ul>
<p>In addition to processors and processing speed, GPU cards have fast, specialized memory access.  They have a limited amount of local memory, but if you can figue out a way to use it efficiently, your memory access is on the order of 100x faster than conventional memory.</p>
<p>The combination of more processors and faster memory means that if you can discretize or parallelize the type of work that you want to perform, you can get radical speed improvements.</p>
<h2><strong>GIS on the GPU.</strong></h2>
<p>That&#8217;s all well and good, but how can GPGPU be used for GIS?  We are <a href="http://www.directionsmag.com/article.php?article_id=3418" target="_blank">not the only ones thinking about this</a>, but the answer depends on what kind of analysis you want to do.  We have been focusing our research on a few types of MapAlgebra operations, and our preliminary investigations have shown that all types of MapAlgebra operations can benefit from processing on the GPU.  In addition, we believe substantial improvements can be made in some types of vector processing with a few likely candidates would be:</p>
<ul>
<li>Vector-to-raster and raster-to-vector conversion</li>
<li>Network analysis</li>
<li>Network routing</li>
<li>Transformations of geometric collections</li>
</ul>
<p>All of these optimizations have the potential of reducing the computing time for heavy duty GIS operations from hours to minutes, and therefore minutes to seconds.  With that kind of speedup, the <a href="http://www.websiteoptimization.com/speed/1/">&#8220;attention threshold&#8221;</a> of the web can be achieved.  It now becomes possible to run more complex GIS tasks in a web environment, bringing more computing power to the masses.</p>
<p>These changes won&#8217;t change the world right away, but it will make GIS analysis more interactive, responsive, and efficient.  Just imagine if you could complete any given task in your day in 1/10th the time (think <a href="http://www.imdb.com/name/nm1714016/">Dash</a>, from the <a href="http://www.imdb.com/title/tt0317705/">Incredibles</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.azavea.com/blogs/labs/2010/06/gpu-computing-for-gis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

