Truncating Floats in OpenLayers and SQLServer

A perfectly valid question when dealing with map coordinates is “How accurate do we need to be?” For some applications, a tenth of a degree is more than accurate enough while for others, several more decimal places are needed. Sometimes this question is answered for you: if your data source only stores four decimal places, then that’s all the precision you’re going to have. If you’re in the lucky (unlucky?) position of generating your own coordinates, one common answer to the “how accurate” debate is “store it all”. This is the path Sajara chose, mostly because we didn’t have a good reason to choose a less precise solution over a more precise one. It just so happens that SQLServer’s floating data type precision limit is not tied so much to the number of decimal places, as to the number of numeric digits to be stored. They allow up to 16 numeric digits, plus a period character and a negative character as needed. Sajara works with coordinate systems in both meters and degrees, so depending on which system we’re using for a given implementation, we could be storing a value far more precise than is even visible to the naked eye.

Fast forward a few years and bring OpenLayers into the mix. We rewrote the asset editing portion of the software to allow data managers to move asset coordinates using an OpenLayers map. These coordinates were saved with still considerably more precision than we needed, but remember, we’re storing whatever precision we get. So far so good.

Now back to the present and we’re working on a comparison tool for our data managers. Suddenly values in the database are not matching the values coming out of our OpenLayers map. Almost, but not quite. In fact, only the last degrees of precision are different. After a bit of digging, we discovered that OpenLayers was returning numbers with between 1 and 3 fewer decimal places than our stored coordinates. Remember that we’re talking about distance differences smaller than a crack in the sidewalk here, but programing languages don’t know anything about “close enough”. Either two numbers are the same or they aren’t and -39.6827663878 is not the same as -39.682766387 no matter how small the physical difference is. So we started digging for the reason.

OpenLayers has a value tucked away in its utility files that sets the default precision of a floating point number to 14 characters. This limit was added when a user noticed that the edges of certain coordinate systems were not behaving correctly due to some floating-point math precision errors. While the OpenLayers community recognizes that most systems allow floats to have 16 digits,  “14 significant digits are sufficient to represent sub-millimeter accuracy in any coordinate system that anyone is likely to use with OpenLayers“. So OpenLayers’ answer to the accuracy question is to save everything that will fit in a standard float, with a few decimal places pared off just in case.

So the next question is: “So what?” The difference between 14 and 16 decimal places in a meter-based coordinate system is microscopic, and in a degrees-based one it’s not much bigger. So far as storing a saved coordinate in Sajara, we didn’t really care if we had 16 digits or 14 digits; the result wouldn’t look any different to our audience. However, since our initial coordinates had 16 digits and OpenLayers only preserved 14 of them, any programmatic comparison fails! No one likes to deal with false positives, but a 100% false positive rate was unacceptable.

We had a few choices here. First we could reset the default precision value in OpenLayers to zero, which would tell the library to never truncate anything. That’s a fairly simple change but we weren’t sure it wouldn’t have unforeseen data effects. Also, there’s a somewhat vague warning about problems with the Web Mercator projection when this value is zero, which is one of the projections Sajara can use. So that option was out.

Second, we could have told SQLServer to alter the precision of coordinate values to 14, which is a fairly major change. This option was ruled out because of a difference in the definition of “precision” between SQLServer and OpenLayers. I mentioned earlier that SQLServer will store a maximum of 16 numeric digits plus a decimal and a negative sign, so a total of 18 characters. OpenLayers, however, considers the default precision of 14 to mean 14 characters instead of 14 numeric digits. So if a number has a decimal and a negative sign, we’re down to 12 numeric digits.  This little difference reintroduces the possibility of false positives, so it isn’t really a change for the better.

The solution we finally decided to use was to change the OpenLayers default precision value to 18. Why 18? That’s the maximum amount of characters that SQLServer will store for a float, so OpenLayers will always be able to deal with any stored coordinates without having to truncate. Now, if we compare our stored coordinates with OpenLayer coordinates, we only get a change notice when an asset has actually been moved. Which is exactly what we wanted.

Here are some technical details for those interested:

The full variable name is OpenLayers.Util.DEFAULT_PRECISION and can be found in the Util.js file. There are a few good comments preceding the variable in the code, but more background can be found in the OpenLayers ticket #1951. SQLServer information can be found in mdsn. Note that if you wind up changing the OpenLayers precision value, you should do it as soon as possible after loading the library, so you don’t have the possibility of code using different precision values.

Getting an ArcGIS Server Map Cache in S3

When deciding how to best handle the air photos in the new Philadelphia Water Department Stormwater Map Viewer, we kicked around a few ideas. We decided to put the cache in Amazon’s Simple Storage Service to offload some of the local disk requirements and leverage their fast data storage and delivery infrastructure. In moving the process, we learned a few things:

Tune Your Cache

Make sure you spend time planning the cache. Not only will the cache look better in the final application, but it will also load to S3 faster and cost less in the long run.

  • Set the extents in the MXD or MSD before publishing to a map service. The overhead of transferring the 254 byte empty tiles caused a lot of unnecessary burden on the upload process as well as the fact that you are paying for them to be stored in the cloud. If it doesn’t need to be there, don’t build it.
  • Choose the correct image format for the cache. If you are caching a base map and do not need to support transparency, make it a JPEG. If it needs to support background transparency, use PNG. ESRI’s suggestions for planning a map cache can be found here.

Get a Good Tool to Transfer the Files

I started using the free version of Cloudberry Labs S3 explorer. But I had to move over 90 Gbs worth of data to my S3 bucket. The CloudBerry S3 Explorer – Pro supported multithreading which allowed for up to 5 threads to either enumerate through the folders, copy the files or apply the ACL. It is a low cost application that more than pays for itself when moving a lot of files up to a bucket.

When transferring the files up, I was working in blocks of directories, not the whole scale level. It was quicker for me to work in 20 to 30 subdirectories than grabbing a whole scale level. It did require a little bit more management on my end, but more steady progress was made.

Accessing the Tiles

ArcGIS Server does not support cloud hosted caches at the 9.3.1 release. The ESRI Javascript API and Flex API can be extended to use caches hosted in the cloud (Flex example from Mansour Raad), so you’ll have to roll your own. For the Philly Storm Water project, we were using the Open Layers and someone has rolled one for us. There is a patch that can be used to access the cache without communicating through ArcGIS Server straight from the client-side library. The one thing to note is that the Tile Origin is pretty touchy, we had to make some adjustments to the origin values to make sure everything lined up correctly.

Summary

Now that the site is up there and we are starting to get some traffic hitting it, putting the tiles in S3 was the right decision. There is no reason for ArcGIS Server to waste any cycles moving tiles around, let it do the heavy lifting with the vector layers and queries. Hopefully the rumors are true, and the ArcGIS Server 10 release will be more aligned with cloud computing. Until then, there are still plenty of ways to take advantage of the benefits.

Envisioning Development

This is so simple, it’s cool: http://envisioningdevelopment.net/map

I especially like the hourglass-like effect way of populating the columns. It gives one the feel of really counting things. Like when you switch between East Harlem and the Upper East Side.

I would like to be able to see the distribution over the whole city, or the gradients between neighborhoods, but that’s just me. I think the design is neat and clean, and tells a very compelling story.

Philadelphia Civic Hackathon creates a Gang Survey App

SunLight Labs recently held it’s Great American Hackathon, an event that encourages groups in each region of the United States to gather together on one weekend and create software that will make government more open. Two Azavea employees, David Middlecamp and yours truly, participated in the Philadelphia version and also hosted the event in our offices. Josh Tauberer, a PhD candidate at U-Penn, and developer of GovTrack.us, organized the event.

njgangsurvey3

Seven of us came together to create a web-based visualization and display tool based on data from the New Jersey Gang Survey 2007. The NJ State Police have been conducting these surveys every three years since 2001. Using Django, MySQL, OpenLayers, OpenStreetMap and ArcGIS Desktop, we put together a full-blown app in two days. Two analysts from the New Jersey State Police joined us on Saturday, explained the background on the data set, wrote up the text and other content for the site and answered questions on how the data was structured.

New Jersey Gang Survey Viewer

New Jersey Gang Survey Viewer

The result is The New Jersey Gang Survey Viewer. Check it out. I was amazed by how much a small group could accomplish in such a short time frame, particularly when most of the participants neither knew each other nor knew many of the technology tools when they started. The players were:

Scaling Walkshed.org with Varnish and Amazon Web Services

We’re excited for voting to open today for our entry into the NYC Big Apps ContestWalkshed NYC.

Walkshed is very CPU intensive since we generate heatmaps for users’ custom walkability factors on the fly.  Building on the work we did with using Amazon’s content delivery network for RedistrictingTheNation.com, we decided to expand our use of Amazon Web Services (AWS) for Walkshed as well as incorporate technology from the open source Varnish project.

Varnish for hardening (and an easier life)

Varnish is a HTTP accelerator that runs on Linux (and other Unix style OSes).  We experimented with Varnish to solve a few goals:

  • Caching frequently requested files and heatmaps tiles (i.e. the default walkability heatmap tiles)
  • Scaling by letting Varnish load balance between multiple servers
  • Improving reliability by allowing Varnish to resubmit failed requests and monitor server health

By pointing Walkshed.org directly to Varnish, we are able to adjust server configurations on the fly without bringing down our application.   Currently, Varnish provides load balancing between 4 server instances which generate tiles  using Walkshed’s DecisionTree engine.  About 50% of the HTTP requests running through Varnish are cache hits, which helps eliminate unnecessary traffic clogging up our application servers.

One instance is hosted on our private server and is often able to meet demand, but adding 3 High-CPU Extra Large Instances from Amazon lets us improve fault tolerance and handle larger bursts in traffic.  Varnish also monitors the health of our servers and removes them from the cluster if they become unresponsive.

Amazon EC2 Instances (bigger is better)

Our Amazon instances are using the new EBS-based images to improve boot speed.   We’ve found that it takes about 7 minutes from when we launch an instance until it is successfully added to our Varnish pool, which certainly isn’t bad.   By combining Varnish with Amazon’s on-demand resources, we should theoretically be able to scale as much as necessary.  For this demo application, scaling is a manual process, but we are looking toward a future where the cluster would scale automatically based on demand.

We also experimented with a few EC2 instance sizes.   Since our application is CPU intensive we really found we had to go with the High-CPU Extra Large Instance to get decent performance.   The instances still don’t meet the performance we get on our private VMware-based server, but our hunch is that this is due to layers of virtualization causing memory allocation to be slow.

Technologies Used:

Ignite: Spatial, Boston

I got the opportunity to present at Ignite: Spatial, Boston a couple weeks ago.  I was fortunate to present Sourcemap.org in the company of other Boston area techies doing some cool work in laser scanning, CityML, social media and more.

All the videos are on YouTube. The presentation summaries are also online in this Google Doc.

Enjoy your spatial ignition this morning.

Echos of the Browser Wars

I caught this link in my feeds today: http://radar.oreilly.com/2009/12/google-android-on-inevitabilit.html

A good read on where mobile devices are, and why it is a non-trivial thing to gain market share in the mobile market.  Specifically, the article discusses the hurdles that Google is trying to jump with its investment into Android, and how Apple is setting the bar high with its i* products.

One of the things that jumps out at me is that the technical challenges of mobile development are nearly synonymous with those in web application development.  Mark Sigal points out that development in the mobile realm is essentially heterogenous.  I had a conversation with a team lead at uLocate a few weeks ago that explained the matrix that characterized this heterogeny.  It’s nuts.  It’s a 4 dimensional matrix, where the dimensions are: Device, Carrier, Platform, and OS.

I’m comparing it to the browser wars because when I test KIF (Kaleidocade Indicators Framework ) I look at the application across a 3 dimensional matrix, where the dimensions are: Browser, Version, and OS.

I can see how that similarity may make it easy for a developer to switch between developing a mobile application and developing a web application, since the testing strategy would be very similar.  I would like to see that transition be a smooth one (as a web developer and someone with a recreational interest in developing tools/toys for mobile devices), so that warms my heart.

However, what I see as a dangerous element to that matrix is how it can get so big so fast.  In the browser market, the matrix is limited to only a handful of items in each dimension.  In the mobile market, however, the number of handsets is always growing — so much so that it’s hard for developers to keep up.  Russell Beattie (Nokia employee) puts it this way (full article):

Multiply the number of models [Nokia puts out] per year (10-20) by the number of years Symbian’s been around by the various custom carrier modifications, and you get complete developer and consumer confusion.

From the chatter I’ve seen, it seems like it’s going to be a teething process by Google, then all out mobile platform wars after that.  The end result?  Probably the same as where we are today, in terms of browsers: supporting about 4 major browsers, with minor differences between them. That provides support to about 97% (as of 12/4/2009) of all browsers out there.  Not bad, but it’ll take mobile a while to get there, and I suspect there will be some corporate blood letting before it’s all over.