Azavea Labs

Where software engineering meets GIS.

GeoTrellis 0.10.0 is released

The long awaited GeoTrellis 0.10 release is here!

It’s been a while since the 0.9 release of GeoTrellis, and there are many significant changes and improvements in this release. GeoTrellis has become an expansive suite of modular components that aide users in the building of geospatial applications in Scala, and as always we’ve focused specifically on high performance and distributed computing. This is the first official release that supports working with Apache Spark, and we are very pleased with the results that have come out of the decision to support Spark as our main distributed processing engine. Those of you who have been tuned in for a while know we started with a custom built processing engine based on Akka actors; this original execution engine still exists in 0.10 but is in a deprecated state in the geotrellis-engine subproject. Along with upgrading GeoTrellis to support Spark and handle arbitrarily-sized raster data sets, we’ve been making improvements and additions to core functionality, including adding vector and projection support.

It’s been long enough that release notes, stating what has changed since 0.9, would be quite unwieldy. Instead I put together a list of features that GeoTrellis 0.10 supports. This is included in the README on the GeoTrellis Github, but I will put them here as well. It is organized by subproject, with more basic and core subprojects higher in the list, and the subprojects that rely on that core functionality later in the list, along with a high level description of each subproject.


This subproject is a wrapper around proj4j, which handles Coordinate Reference Systems (CRS) and transforming points between projections.

  • Represent a CRS based on Ellipsoid, Datum, and Projection.
  • Translate CRSs to and from proj4 string representations.
  • Lookup CRS’s based on EPSG and other codes.
  • Transform `(x, y)` coordinates from one CRS to another.


This subproject is mostly a wrapper around JTS, but also adds features such as vector reprojection and GeoJSON reading and writing.

  • Provides a Scala idiomatic wrapper around JTS types: Point, Line (LineString in JTS), Polygon, MultiPoint, MultiLine (MultiLineString in JTS), MultiPolygon, GeometryCollection
  • Methods for geometric operations supported in JTS, with results that provide a type-safe way to match over possible results of geometries.
  • Provides a Feature type that is the composition of a geometry and a generic data type.
  • Read and write geometries and features to and from GeoJSON.
  • Read and write geometries to and from WKT and WKB.
  • Reproject geometries between two CRSs.
  • Geometric operations: Convex Hull, Densification, Simplification
  • Perform Kriging interpolation on point values.
  • Perform affine transformations of geometries


This subproject provides utilities for testing against vector data.

  • GeometryBuilder for building test geometries
  • GeometryMatcher for scalatest unit tests, which aides in testing equality in geometries with an optional threshold.


This project deals with raster data, and is the core data model and single-threaded operations that have carried over from 0.9, with many improvements and additions.

  • Provides types to represent single- and multi-band rasters, supporting Bit, Byte, UByte, Short, UShort, Int, Float, and Double data, with either a constant NoData value (which improves performance) or a user defined NoData value.
    Treat a tile as a collection of values, by calling “map” and “foreach”, along with floating point valued versions of those methods (separated out for performance).
  • Combine raster data in generic ways.
  • Render rasters via color ramps and color maps to PNG and JPG images.
  • Read GeoTiffs with DEFLATE, LZW, and PackBits compression, including horizontal and floating point prediction for LZW and DEFLATE.
  • Write GeoTiffs with DEFLATE or no compression.
  • Reproject rasters from one CRS to another.
  • Resample of raster data.
  • Mask and Crop rasters.
  • Split rasters into smaller tiles, and stitch tiles into larger rasters.
  • Derive histograms from rasters in order to represent the distribution of values and create quantile breaks.
  • Local Map Algebra operations: Abs, Acos, Add, And, Asin, Atan, Atan2, Ceil, Cos, Cosh, Defined, Divide, Equal, Floor, Greater, GreaterOrEqual, InverseMask, Less, LessOrEqual, Log, Majority, Mask, Max, MaxN, Mean, Min, MinN, Minority, Multiply, Negate, Not, Or, Pow, Round, Sin, Sinh, Sqrt, Subtract, Tan, Tanh, Undefined, Unequal, Variance, Variety, Xor, If
  • Focal Map Algebra operations: Hillshade, Aspect, Slope, Convolve, Conway’s Game of Life, Max, Mean, Median, Mode, Min, MoransI, StandardDeviation, Sum
  • Zonal Map Algebra operations: ZonalHistogram, ZonalPercentage
  • Operations that summarize raster data intersecting polygons: Min, Mean, Max, Sum.
  • Cost distance operation based on a set of starting points and a friction raster.
  • Hydrology operations: Accumulation, Fill, and FlowDirection.
  • Rasterization of geometries and the ability to iterate over cell values covered by geometries.
  • Vectorization of raster data.
  • Kriging Interpolation of point data into rasters.
  • Viewshed operation.
  • RegionGroup operation.


This subproject provides utilities for testing against raster data.

  • Build test raster data.
  • Assert raster data matches Array data or other rasters in scalatest.


This is the subproject that enables Apache Spark to work with GeoTrellis types.

  • RDD’s (resilient distributed datasets) of core GeoTrellis types, coupled with spatial or spatiotemporal keys, allow users to work with very large data, with the speed and resilience that Spark users expect.
  • Generic way to represent key value RDDs as layers, where the key represents a coordinate in space based on some uniform grid layout, optionally with a temporal component.
  • Represent spatial or spatiotemporal raster data as an RDD of raster tiles.
  • Generic architecture for saving/loading layers RDD data and metadata to/from various backends, using Spark’s IO API with Space Filling Curve indexing to optimize storage retrieval (support for Hilbert curve and Z order curve SFCs). HDFS and local file system are supported backends by default, S3 and Accumulo are supported backends by the `geotrellis-s3` and `geotrellis-accumulo` projects, respectively.
  • Query architecture that allows for simple querying of layer data by spatial or spatiotemporal bounds.
  • Perform map algebra operations on layers of raster data, including all supported Map Algebra operations mentioned in the geotrellis-raster feature list.
  • Perform seamless reprojection on raster layers, using neighboring tile information in the reprojection to avoid unwanted NoData cells.
  • Pyramid up layers through zoom levels using various resampling methods.
  • Types to reason about tiled raster layouts in various CRS’s and schemes.
  • Perform operations on raster RDD layers: crop, filter, join, mask, merge, partition, pyramid, render, resample, split, stitch, and tile.
  • Polygonal summary over raster layers: Min, Mean, Max, Sum.
  • Save spatially keyed RDDs of byte arrays to z/x/y files into HDFS or the local file system. Useful for saving PNGs off for use as map layers in web maps or for accessing GeoTiffs through z/x/y tile coordinates.
  • Utilities around creating spark contexts for applications using GeoTrellis, including a Kryo registrator that registers most types.


This subproject aides in creating unit tests for geotrellis-spark work.

  • Utility code to create test RDDs of raster data.
  • Matching methods to test equality of RDDs of raster data in scalatest unit tests.


This subproject enables Accumulo as a backend for geotrellis-spark IO.

  • Save and load layers to and from Apache Accumulo. Query large layers efficiently using the layer query API.


This subproject enables S3 as a backend for geotrellis-spark IO.

  • Save/load raster layers to/from Amazon Web Services Simple Storage Service (S3).
  • Save spatially keyed RDDs of byte arrays to z/x/y files in S3. Useful for saving PNGs off for use as map layers in web maps.


This subproject serves as a command-line client builder for creating ETL (Extract, Transform, and Load) applications using GeoTrellis and Spark.

  • Parse command line options for input and output of ETL applications.
  • Utility methods that make ETL applications easier for the user to build.
  • Work with input rasters from the local file system, HDFS, or S3
  • Reproject input rasters using a per-tile reproject or a seamless reprojection that takes into account neighboring tiles.
  • Transform input rasters into layers based on a ZXY layout scheme
  • Save layers into Accumulo, S3, HDFS or the local file system.


This subproject simply contains a GeoTools-based reader for shapefile feature data.

  • Read geometry and feature data from shapefiles into GeoTrellis types using GeoTools.


This subproject allows geotrellis vector data to work with PostGIS.

  • Save and load geometry and feature data to and from PostGIS using the slick scala database library.
  • Perform PostGIS `ST_` operations in PostGIS through scala.

Thank you to everyone involved in making this release happen, especially to our helpful users who reported problems and made suggestions as 0.10 was being developed, and who put up with a tumultuous API as we nailed it down. Here’s to our users, our contributors, core developers, and to the future of the GeoTrellis project.


Converting Mapbox Studio Vector Tiles to Rasters

Screenshot of pirate styled map

Pirate map by AJ Ashton on

If you’ve tried to make your own custom map styles before you’ve probably used MapBox Studio or its predecessor, Tilemill. Mapbox is doing a huge amount of work around custom maps and map data. As part of this, they’ve developed many open source tools and some file specifications as well. It’s hard not to be impressed by the quality and usefulness of products like Mapbox Studio.

Recently, I had a tricky task that involved working with map tiles generated by Mapbox Studio. We developed an application for use on tablets that needed to use a custom map AND work offline. This ruled out many common options for tile serving. Still, we developed our custom map style in Mapbox Studio because it is an excellent tool. The challenge was taking the on-the-fly tile rendering based on Mapbox Studio vectors and getting them into a format that could be packaged with our application.

Some Background

Customizing the presentation of map data isn’t a simple task. To make this task a little more pleasant and intuitive for developers accustomed to style languages such as CSS, Mapbox developed CartoCSS, which looks much like CSS. To make a fast feedback cycle that would give a similar experience to developing HTML/CSS, the Mapbox developers created Tilemill, which allows users to set up a map, write some CartoCSS and see the changes nearly immediately. It’s great software and I think it has created a proliferation of smart and beautiful cartography. Philly recently celebrated a blueprint styled map created by Lauren Ancona.

Blue Print Map Screenshot

Lauren Ancona’s Blue Print map served from

Tilemill could upload your tiles to Mapbox which would act a tile server for your projects. This service is part of the economic model for Mapbox. For most users, offloading tile serving to a specialized and highly performant cluster of Mapbox servers is an obvious choice and well worth the cost. However being a very open platform from a company with roots in open source software, Tilemill also allowed users to save the tiles directly out to disk as an mbtiles file. This is a special file format also developed my Mapbox. It’s actually just a SQLite database with a known format. Some users opted to self host map tiles but probably a small fraction.

Mapbox Studio is the next iteration of Tilemill. It has many of the same features and many improvements. One of the biggest changes between Tilemill and Mapbox Studio is that Tilemill relied on raster images and Mapbox Studio relies on vector tiles. This is a big gap in how images are generated. In the raster model, the server either pre-renders images for each tile of the map and saves it on disk or it generates images on the fly as they are requested and sends those back in the response. In either event, the data being transferred to the users who see a map, are images. Images are large and somewhat clunky from a data perspective. If you want to change the map style, you have to regenerate all those images. It takes a lot of computing power and all that load is on the server.

Mapbox has been moving to a vector model for tile serving and they’ve developed specifications on how to do this. The idea in the vector tile model is that the server sends the data that goes on a map tile to the user. This means, that the names of roads, the shapes of buildings, the position of rivers is accounted for, but not what color to make those shapes and not the fonts to use in them. In most cases sending back only the raw data and not the rendered tile is much faster and puts less load on the server. The vector data is then mixed with the style data by the client where it is rendered. This is similar to how CSS and HTML work. The server sends the style information along with the content data, and the browser creates a visual presentation. The work of generating the presentation has been offloaded to the user. Distributed computing at it’s best!

Because Mapbox is moving to a vector based model for tiles, in Mapbox Studio the ability to export the rendered tiles to an mbtiles file was removed. Under most circumstances this is fine; if you really need raster functionality, you can still use Tilemill and if you really want to serve your own tiles, you can actually serve your own vector tiles. One confusing aspect to this change from Tilemill to Mapbox Studio and raster to vector tiles is that Mapbox Studio can export an mbtiles file but it stores vector data rather than raster. Same filename different (and incompatible) data.

Sometime you just need rasters

Screen shot of Mapbox's Pencil Map (

Screen shot of Mapbox’s Pencil Map (

Since our application needed to work offline, and since our map didn’t cover a large area, packaging the tiles with the application was the most reasonable option. Our Data Analytics team worked on a nice map style in Mapbox Studio based on the pencil style developed by Mapbox and it wasn’t an option to backport it to Tilemill. I started looking for ways to convert Mapbox Studio vector tiles to raster tiles. There’s surprisingly little information out there about how to do this. Luckily, being open source, many of the features of Mapbox Studio are separated into their own libraries and there are node.js modules that can use these or add extra features to them. I found a small module called “tl” which I presume is an abbreviation of “tilelive” for which this module provides features. This is a command line utility that will grab raster tiles delivered to the client by the rendering server in Mapbox Studio and stream them to an old-school raster based mbtiles file. It takes a long time to do this so it’s really only good for a small area but that’s all we really needed for our application.

I also found a tile server called tessera (from the same developer as tl), which can serve tiles in just about any format you can think of. I decided to use tessera to start debugging. I figured if I could get tessera to serve my tiles, then at least I would know they work and I could figure out how to get tl to save them.

This is where you can start following along if you are looking for a tutorial, however if you want the spoilers, you can skip to the end where I link to an Ansible role that provisions a Vagrant/Virtualbox server and gets everything ready for you.

Try it yourself

These projects are all node.js programs so if you don’t have it installed already you need to get it. I recommend grabbing the latest from the v0.10 series. I did all my development and testing on v0.10.28. I should also mention that the commands listed are for Linux/Unix/OSX. If you are a windows user you’ll have to modify the commands slightly or use a Linux virtual machine (which I recommend anyway).

Mapbox Studio stores all your projects in a folders that ends in .tm2 (for Tilemill2). They’re all stored in your user directory. Getting tessera to serve your Mapbox Studio tm2 project is supposed to be as easy as installing tessera and tilelive providers (APIs for different kinds of tile and style data) and then running a command to kick off the server with your tm2 project.

npm install -g tessera tl mbtiles mapnik tilelive tilelive-file tilelive-http tilelive-mapbox tilelive-mapnik tilelive-s3 tilelive-tmsource tilelive-tmstyle tilelive-utfgrid tilelive-vector tilejson

tessera tmstyle:///path/to/your/mapproject.tm2

You might need to run the npm command as root or prefix with with sudo (sudo npm…) but that is supposed to work. For me it did not.

First, I got complaints that the network request couldn’t be completed. The reason for this is that Mapbox Studio streams data from and uses an API token. The reason you need an account with Mapbox just to use the software is that they associate that API token with your program. To make the requests from the command line, you need to provide the token as a variable that is passed with each request. It took some digging but I found that this variable is named MAPBOX_ACCESS_TOKEN. Log on to and visit the projects page. You should see your token at the top of the page. You can supply this as an environment variable for your current session by issuing the following command:

export MAPBOX_ACCESS_TOKEN=mytokenhere

Make sure it worked by having it echo back to you:


This should show you your token. At this point I tried running the tessera command again but it still failed. This time I got an error message about missing fonts. Mapnik, which is the realtime map tile renderer needed to know the location of all the fonts being used in the project. Again it took some digging. But I found that this can also be supplied via the environment variable MAPNIK_FONT_PATH. I moved all the fonts into a known directory and then issued the following command:

export MAPNIK_FONT_PATH=/path/to/font/directory

In my case I just used the global font directory (/usr/share/fonts) for ubuntu which gave Mapnik access to all my fonts.

After this, tessera worked.

tessera tmstyle:///path/to/your/mapproject.tm2

This started a server on port 8080. Visiting localhost:8080 gave me a leaflet map that I could explore.

Next I tried using tl to export the tiles to an mbtiles file. This program has a copy command. You’ll need to supply information about what part of the map to copy.

tl copy -z 17 -Z 18 -b "-75.171375 39.945049 -75.15554 39.956991" tmstyle:///absolute/path/to/project.tm2 mbtiles:///path/to/save/tiles.mbtiles

This will grab tiles for zoom levels 17 and 18 for the area of Philadelphia around City Hall and save it to a classic raster mbtiles file. Lowercase “z” is the starting zoom, uppercase “Z” is the ending zoom and “b” is the bounding box for the map constraints.

Finally with an mbtiles file I used mbutil, another of Mapbox’s great libraries to extract the tiles and embed them in our application.

I packaged all this up into an Ansible role and Vagranfile. The results are a tile converter virtual machine. To use this you’ll need Ansible, Virualbox, and Vagrant installed. Once this is done follow the instructions in the readme and you should be good to go.

Selecting a NAT Instance Size on EC2

We’ve been using the Amazon Web Services (AWS) Virtual Private Cloud (VPC) functionality to create an isolated and secure hosting environment for our SaaS product, HunchLab.  When EC2 servers in a VPC with only private IP addresses need access to S3 (or to the Internet) the network traffic must be routed through a NAT instance.  This architecture provides increased security by reducing the external surface area of the application.

There are many resources about setting up a NAT instance in AWS.  Many examples setup NAT instances as the m1.small or t2.micro instance sizes.  Both instance sizes are low-cost and so a natural starting point for experimentation.

The m1.small is a prior generation EC2 instance type with Amazon recommending an upgrade path to the m3 instance family.  The m3 family does not, however, have a small instance where only a limited amount of memory is required.  The t2 instances seem like a natural fit from a cost perspective but Amazon lists their network performance as ‘low to moderate’, which wasn’t very assuring given that the primary purpose of a NAT instance is to provide network connectivity to the rest of the servers within the application.

Given that EC2 does not provide a network focused instance family like they do with compute, memory, and storage optimized families, my question was:

Which NAT instance size should we use in production?

I decided to answer this question by benchmarking several instance sizes.  I tested the m1.small instance size and it’s closest replacement, the m3.medium. I also tested all three t2 instances (t2.micro, t2.small, t2.medium) because they are low cost and a new instance family which likely benefits from the latest back-end EC2 architecture improvements.

AWS rates the network performance of each instance type as low, moderate, high, or 10 Gigabit. To include instances with “enhanced networking” enabled, I also included the c3.large and c3.2xlarge instance sizes.  Enhanced networking is designed to improve packets per second and reduce latency through better virtualization. The c3.2xlarge is also rated as high network performance.  For all instance types I used the latest stock NAT AMI provided by AWS for my testing.

One component of our application generates large files that we store within S3.  To benchmark the throughput of the different NAT instances I stored the Ubuntu 14.04 Server ISO file within a bucket in S3 in the same region as our servers. For each instance size, I downloaded the ISO file 10 times using wget from a server behind the NAT instance and recorded the throughput in MBps for each sample.  I then calculated the median bandwidth and the TP80 metric (the top 80% of the samples).

I also recorded the price per hour to run each instance type in our region using reservation pricing for instances that are part of current generations.   Finally, I calculated the bandwidth per unit of cost to determine the sweet spot along the performance-cost curve.  Here are the results.


NAT Instance Median Bandwidth TP80 Bandwidth Cents / Hour Median Bandwidth / Cost TP80 Bandwidth / Cost
m1.small 8.3 MBps 3.5 MBps 4.40 cents 1.88 MBps / cent 0.80 MBps / cent
t2.micro 2.7 MBps 1.7 MBps 0.86 cents 3.14 MBps / cent 1.98 MBps / cent
t2.small 13.9 MBps 10.2 MBps 1.72 cents 8.08 MBps / cent 5.92 MBps / cent
t2.medium 20.7 MBps 19.14 MBps 3.45 cents 6.00 MBps / cent 5.55 MBps / cent
m3.medium 20.4 MBps 16.6 MBps 4.25 cents 4.79 MBps / cent 3.91 MBps / cent
c3.large 43.2 MBps 32.76 MBps 6.19 cents 6.98 MBps / cent 5.29 MBps / cent
c3.2xlarge 43.3 MBps 39.02 MBps 24.77 cents 1.75 MBps / cent 1.58 MBps / cent


The m1.small instance, which most examples utilize, offers quite limited bandwidth and is not a good choice for a production environment.   The t2.micro instance is even worse. The t2.small and t2.medium instances seem like good fits for production environments where cost is a concern. The c3 instances with enhanced networking clearly realize a performance boost compared to the other instances but come at a higher cost.   For a single simultaneous transfer from S3 the c3.2xlarge instance does not realize much of an improvement over the c3.large, but I imagine that more concurrent transfers would realize a higher overall throughput.

This benchmark is of course subject to the particular hosts that I landed on during my testing.  If I repeated the test, I would expect variability in the benchmarks for the t2 family due to their burstable design.  For our use case, the t2.medium seems like a good choice.


Running Vagrant with Ansible Provisioning on Windows

At Azavea we use Ansible and custom ansible roles quite a bit.

We’ve also been using Vagrant for quite some time to create project-specific development environments.  Adding Ansible as a provisioner makes setting up a development environment wonderfully smooth.

Unfortunately, Ansible is not officially supported with Windows as the control machine.

It is possible to get Ansible running in a Cygwin environment.  With a bit of work, you can get it running from Vagrant too!

Installing Cygwin

The first step to getting Ansible running is installing Cygwin.  You can follow the normal installation instructions for Cygwin if you’d like to, or if you already have a Cygwin environment set up that’s great too!

We’re using babun instead of Cygwin’s normal installer for a simpler installation and package installation process.  If you’re new to using Cygwin or having trouble with the standard installer I’d recommend this.

Setting up Ansible

Once you’ve got Cygwin installed, you’ll want to open up a terminal. You’ll need to use a Cygwin terminal, and not cmd.exe, whenever you want to run ansible-playbook or vagrant.

You’ll need to install pip, to be able to install Ansible. You’ll also need some packages Ansible needs to run that can’t be installed by pip. If you’re using the standard Cygwin installer, run it again and make sure python, python-paramiko, python-crypto, gcc-g++, wget, openssh python-setuptools are all installed. We need gcc-g++ to compile source code when installing PyYAML from PyPi.

If you’re using babun, this is:

pact install python python-paramiko python-crypto gcc-g++ wget openssh python-setuptools

You might get the following error if you try to run python: ImportError: No module named site.
If you see that error add the following to your ~/.bashrc or ~/.zshrc (in your Cygwin home folder) and source it:

export PYTHONHOME=/usr
export PYTHONPATH=/usr/lib/python2.7

Next lets get pip installed, and install Ansible itself.

python /usr/lib/python2.7/site-packages/ pip
pip install ansible

Making Ansible Run From Vagrant

Once that is done, you should be able to run ansible-playbook from bash or zsh.

However, that isn’t enough to use Ansible as a Vagrant provisioner. Even if you call vagrant from bash or zsh, vagrant won’t be able to find ansible-playbook, because it isn’t on the Windows PATH. But even if we put ansible-playbook on the Windows PATH, it won’t run, because it needs to use the Cygwin Python.

To ensure we’re using the Python in our Cygwin environment, we need a way to run ansible-playbook through bash. The solution we came up with was to create a small Windows batch file and place it somewhere on the Windows PATH as ansible-playbook.bat:

@echo off

REM If you used the stand Cygwin installer this will be C:\cygwin
set CYGWIN=%USERPROFILE%\.babun\cygwin

REM You can switch this to work with bash with %CYGWIN%\bin\bash.exe
set SH=%CYGWIN%\bin\zsh.exe

"%SH%" -c "/bin/ansible-playbook %*"

This is enough to let Vagrant find ansible-playbook and run the Ansible provisioner.

You’ll likely run into the following error when you try and provision your first Vagrant VM:

GATHERING FACTS ***************************************************************
fatal: [app] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue

To get around this, we had to create a ~/.ansible.cfg in our Cygwin home directory (this can also go in your project directory as ansible.cfg) changing what the ssh ControlPath was set to:

control_path = /tmp

And with that you should be ready to provision using Ansible!

If you want to run other Cygwin programs from your Vagrantfile, such as ansible-galaxy, you’ll have to make another batch file. For an example of how to easily make a bunch of wrapper batch files, checkout this gist.

Creating Ansible Roles from Scratch: Part 2

In part one of this series, we created the outline of an Ansible role to install Packer with ansible-galaxy, and then filled it in. In this post, we’ll apply the role against a virtual machine, and ultimately, install Packer!

A Playbook for Applying the Role

After all of the modifications from the previous post, the directory structure for our role should look like:

├── defaults
│   └── main.yml
├── meta
│   └── main.yml
└── tasks
    └── main.yml

Now, let’s alter the directory structure a bit to make room for a top level playbook and virtual machine definition to test the role. For the virtual machine definition, we’ll use Vagrant.

To accommodate the top level playbook, let’s move the azavea.packer directory into a roles directory. At the same level as roles, let’s also create a site.yml playbook and a Vagrantfile. After those changes are made, the directory structure should look like:

├── Vagrantfile
├── roles
│   └── azavea.packer
│       ├──
│       ├── defaults
│       │   └── main.yml
│       ├── meta
│       │   └── main.yml
│       └── tasks
│           └── main.yml
└── site.yml

The contents of the site.yml should contain something like:

- hosts: all
  sudo: yes
    - { role: "azavea.packer" }

This instructs Ansible to apply the azavea.packer role to all hosts using sudo.

And the contents of the Vagrantfile should look like:

# -*- mode: ruby -*-
# vi: set ft=ruby :


Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| = "ubuntu/trusty64"

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "site.yml"

Here we’re making use of the ubuntu/trusty64 box on Vagrant Cloud, along with the Ansible provisioner for Vagrant.

Running vagrant up from the same directory that contains the Vagrantfile should bring up a Ubuntu 14.04 virtual machine, and then attempt use ansible-playbook to apply site.yml. Unfortunately, that attempt will fail, and we’ll be met with the follow error:

ERROR: cannot find role in /Users/hector/Projects/blog/roles/azavea.unzip or
/Users/hector/Projects/blog/azavea.unzip or /etc/ansible/roles/azavea.unzip

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Where is this reference to azavea.unzip coming from? Oh, that’s right, we had it listed as a dependency in the Packer role metadata…

Role Dependencies

Role dependencies are references to other Ansible roles needed for a role to function properly. In this case, we need unzip installed in order to extract the Packer binaries from

To resolve the dependency, azavea.unzip needs to exist in the same roles directory that currently houses azavea.packer. We could create that role the same way we did azavea.packer, but azavea.unzip already exists within Ansible Galaxy (actually, so does azavea.packer).

In order to install azavea.unzip into the roles directory, we can use the ansible-galaxy command again:

$ ansible-galaxy install azavea.unzip -p roles
 downloading role 'unzip', owned by azavea
 no version specified, installing 0.1.0
 - downloading role from
 - extracting azavea.unzip to roles/azavea.unzip
azavea.unzip was installed successfully

Now, if we try to reprovision the virtual machine, the Ansible run should complete successfully:

$ vagrant provision
==> default: Running provisioner: ansible...

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [default]

TASK: [azavea.unzip | Install unzip] ******************************************
changed: [default]

TASK: [azavea.packer | Download Packer] ***************************************
changed: [default]

TASK: [azavea.packer | Extract and install Packer] ****************************
changed: [default]

PLAY RECAP ********************************************************************
default                    : ok=4    changed=3    unreachable=0    failed=0

Before we celebrate, let’s connect to the virtual machine and ensure that Packer was installed properly:

$ vagrant ssh
vagrant@vagrant-ubuntu-trusty-64:~$ packer
usage: packer [--version] [--help]  []

Available commands are:
    build       build image(s) from template
    fix         fixes templates from old versions of packer
    inspect     see components of a template
    validate    check that a template is valid

Globally recognized options:
    -machine-readable    Machine-readable output format.

Excellent! The Packer role we created has successfully installed Packer!