Scaling Walkshed.org with Varnish and Amazon Web Services

We’re excited for voting to open today for our entry into the NYC Big Apps ContestWalkshed NYC.

Walkshed is very CPU intensive since we generate heatmaps for users’ custom walkability factors on the fly.  Building on the work we did with using Amazon’s content delivery network for RedistrictingTheNation.com, we decided to expand our use of Amazon Web Services (AWS) for Walkshed as well as incorporate technology from the open source Varnish project.

Varnish for hardening (and an easier life)

Varnish is a HTTP accelerator that runs on Linux (and other Unix style OSes).  We experimented with Varnish to solve a few goals:

  • Caching frequently requested files and heatmaps tiles (i.e. the default walkability heatmap tiles)
  • Scaling by letting Varnish load balance between multiple servers
  • Improving reliability by allowing Varnish to resubmit failed requests and monitor server health

By pointing Walkshed.org directly to Varnish, we are able to adjust server configurations on the fly without bringing down our application.   Currently, Varnish provides load balancing between 4 server instances which generate tiles  using Walkshed’s DecisionTree engine.  About 50% of the HTTP requests running through Varnish are cache hits, which helps eliminate unnecessary traffic clogging up our application servers.

One instance is hosted on our private server and is often able to meet demand, but adding 3 High-CPU Extra Large Instances from Amazon lets us improve fault tolerance and handle larger bursts in traffic.  Varnish also monitors the health of our servers and removes them from the cluster if they become unresponsive.

Amazon EC2 Instances (bigger is better)

Our Amazon instances are using the new EBS-based images to improve boot speed.   We’ve found that it takes about 7 minutes from when we launch an instance until it is successfully added to our Varnish pool, which certainly isn’t bad.   By combining Varnish with Amazon’s on-demand resources, we should theoretically be able to scale as much as necessary.  For this demo application, scaling is a manual process, but we are looking toward a future where the cluster would scale automatically based on demand.

We also experimented with a few EC2 instance sizes.   Since our application is CPU intensive we really found we had to go with the High-CPU Extra Large Instance to get decent performance.   The instances still don’t meet the performance we get on our private VMware-based server, but our hunch is that this is due to layers of virtualization causing memory allocation to be slow.

Technologies Used:

Both comments and trackbacks are currently closed.