Azavea Labs

Where software engineering meets GIS.

Running Vagrant with Ansible Provisioning on Windows

At Azavea we use Ansible and custom ansible roles quite a bit.

We’ve also been using Vagrant for quite some time to create project-specific development environments.  Adding Ansible as a provisioner makes setting up a development environment wonderfully smooth.

Unfortunately, Ansible is not officially supported with Windows as the control machine.

It is possible to get Ansible running in a Cygwin environment.  With a bit of work, you can get it running from Vagrant too!

Installing Cygwin

The first step to getting Ansible running is installing Cygwin.  You can follow the normal installation instructions for Cygwin if you’d like to, or if you already have a Cygwin environment set up that’s great too!

We’re using babun instead of Cygwin’s normal installer for a simpler installation and package installation process.  If you’re new to using Cygwin or having trouble with the standard installer I’d recommend this.

Setting up Ansible

Once you’ve got Cygwin installed, you’ll want to open up a terminal. You’ll need to use a Cygwin terminal, and not cmd.exe, whenever you want to run ansible-playbook or vagrant.

You’ll need to install pip, to be able to install Ansible. You’ll also need some packages Ansible needs to run that can’t be installed by pip. If you’re using the standard Cygwin installer, run it again and make sure python, python-paramiko, python-crypto, gcc-g++, wget, openssh python-setuptools are all installed. We need gcc-g++ to compile source code when installing PyYAML from PyPi.

If you’re using babun, this is:

pact install python python-paramiko python-crypto gcc-g++ wget openssh python-setuptools

You might get the following error if you try to run python: ImportError: No module named site.
If you see that error add the following to your ~/.bashrc or ~/.zshrc (in your Cygwin home folder) and source it:

export PYTHONHOME=/usr
export PYTHONPATH=/usr/lib/python2.7

Next lets get pip installed, and install Ansible itself.

python /usr/lib/python2.7/site-packages/easy_install.py pip
pip install ansible

Making Ansible Run From Vagrant

Once that is done, you should be able to run ansible-playbook from bash or zsh.

However, that isn’t enough to use Ansible as a Vagrant provisioner. Even if you call vagrant from bash or zsh, vagrant won’t be able to find ansible-playbook, because it isn’t on the Windows PATH. But even if we put ansible-playbook on the Windows PATH, it won’t run, because it needs to use the Cygwin Python.

To ensure we’re using the Python in our Cygwin environment, we need a way to run ansible-playbook through bash. The solution we came up with was to create a small Windows batch file and place it somewhere on the Windows PATH as ansible-playbook.bat:

@echo off

REM If you used the stand Cygwin installer this will be C:\cygwin
set CYGWIN=%USERPROFILE%\.babun\cygwin

REM You can switch this to work with bash with %CYGWIN%\bin\bash.exe
set SH=%CYGWIN%\bin\zsh.exe

"%SH%" -c "/bin/ansible-playbook %*"

This is enough to let Vagrant find ansible-playbook and run the Ansible provisioner.

You’ll likely run into the following error when you try and provision your first Vagrant VM:

GATHERING FACTS ***************************************************************
fatal: [app] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue

To get around this, we had to create a ~/.ansible.cfg (this can also go in your project directory as ansible.cfg) changing what the ssh ControlPath was set to:

[ssh_connection]
control_path = /tmp

And with that you should be ready to provision using Ansible!

If you want to run other Cygwin programs from your Vagrantfile, such as ansible-galaxy, you’ll have to make another batch file. For an example of how to easily make a bunch of wrapper batch files, checkout this gist.

Creating Ansible Roles from Scratch: Part 2

In part one of this series, we created the outline of an Ansible role to install Packer with ansible-galaxy, and then filled it in. In this post, we’ll apply the role against a virtual machine, and ultimately, install Packer!

A Playbook for Applying the Role

After all of the modifications from the previous post, the directory structure for our role should look like:

├── README.md
├── defaults
│   └── main.yml
├── meta
│   └── main.yml
└── tasks
    └── main.yml

Now, let’s alter the directory structure a bit to make room for a top level playbook and virtual machine definition to test the role. For the virtual machine definition, we’ll use Vagrant.

To accommodate the top level playbook, let’s move the azavea.packer directory into a roles directory. At the same level as roles, let’s also create a site.yml playbook and a Vagrantfile. After those changes are made, the directory structure should look like:

├── Vagrantfile
├── roles
│   └── azavea.packer
│       ├── README.md
│       ├── defaults
│       │   └── main.yml
│       ├── meta
│       │   └── main.yml
│       └── tasks
│           └── main.yml
└── site.yml

The contents of the site.yml should contain something like:

---
- hosts: all
  sudo: yes
  roles:
    - { role: "azavea.packer" }

This instructs Ansible to apply the azavea.packer role to all hosts using sudo.

And the contents of the Vagrantfile should look like:

# -*- mode: ruby -*-
# vi: set ft=ruby :

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "site.yml"
  end
end

Here we’re making use of the ubuntu/trusty64 box on Vagrant Cloud, along with the Ansible provisioner for Vagrant.

Running vagrant up from the same directory that contains the Vagrantfile should bring up a Ubuntu 14.04 virtual machine, and then attempt use ansible-playbook to apply site.yml. Unfortunately, that attempt will fail, and we’ll be met with the follow error:

ERROR: cannot find role in /Users/hector/Projects/blog/roles/azavea.unzip or
/Users/hector/Projects/blog/azavea.unzip or /etc/ansible/roles/azavea.unzip

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Where is this reference to azavea.unzip coming from? Oh, that’s right, we had it listed as a dependency in the Packer role metadata…

Role Dependencies

Role dependencies are references to other Ansible roles needed for a role to function properly. In this case, we need unzip installed in order to extract the Packer binaries from packer_0.7.1_linux_amd64.zip.

To resolve the dependency, azavea.unzip needs to exist in the same roles directory that currently houses azavea.packer. We could create that role the same way we did azavea.packer, but azavea.unzip already exists within Ansible Galaxy (actually, so does azavea.packer).

In order to install azavea.unzip into the roles directory, we can use the ansible-galaxy command again:

$ ansible-galaxy install azavea.unzip -p roles
 downloading role 'unzip', owned by azavea
 no version specified, installing 0.1.0
 - downloading role from https://github.com/azavea/ansible-unzip/archive/0.1.0.tar.gz
 - extracting azavea.unzip to roles/azavea.unzip
azavea.unzip was installed successfully

Now, if we try to reprovision the virtual machine, the Ansible run should complete successfully:

$ vagrant provision
==> default: Running provisioner: ansible...

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************
ok: [default]

TASK: [azavea.unzip | Install unzip] ******************************************
changed: [default]

TASK: [azavea.packer | Download Packer] ***************************************
changed: [default]

TASK: [azavea.packer | Extract and install Packer] ****************************
changed: [default]

PLAY RECAP ********************************************************************
default                    : ok=4    changed=3    unreachable=0    failed=0

Before we celebrate, let’s connect to the virtual machine and ensure that Packer was installed properly:

$ vagrant ssh
vagrant@vagrant-ubuntu-trusty-64:~$ packer
usage: packer [--version] [--help]  []

Available commands are:
    build       build image(s) from template
    fix         fixes templates from old versions of packer
    inspect     see components of a template
    validate    check that a template is valid

Globally recognized options:
    -machine-readable    Machine-readable output format.

Excellent! The Packer role we created has successfully installed Packer!

Creating Ansible Roles from Scratch: Part 1

Within Ansible there are two techniques for reusing a set of configuration management tasks, includes and roles. Although both techniques function in similar ways, roles appear to be the official way forward. Ansible Galaxy was built as a repository for roles, and as we’ll see in this post, ansible-galaxy exists to aid in installing and creating them.

Creating a New Role

Let’s start off by creating a role for Packer.

Packer is a useful tool for producing different machine image types with the same set of configuration management tasks. For example, Packer can be used to take a set of Ansible instructions, funnel them through itself, and produce both an AMI and Docker image.

Enough about Packer though, let’s get back to creating an Ansible role for installing Packer.

The first step in creating a role is creating its directory structure. In order to create the base directory structure, we’re going to use a tool bundled with Ansible (since 1.4.2) called ansible-galaxy:

$ ansible-galaxy init azavea.packer
azavea.packer was created successfully

That command will create an azavea.packer directory with the following structure:

├── README.md
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── tasks
│   └── main.yml
├── templates
└── vars
    └── main.yml

Explaining the Role Directory Structure

A role’s directory structure consists of defaults, vars, files, handlers, meta, tasks, and templates. Let’s take a closer look at each:

defaults

Within defaults, there is a main.yml file with the default variables used by a role. For the Packer role, there is only a packer_version default variable. As of this post, the most recent version of Packer is 0.7.1, so we’ll set it to that:

---
packer_version: "0.7.1"

vars

vars and defaults house variables, but variables in vars have a higher priority, which means that they are more difficult to override. Variables in defaults have the lowest priority of any variables available, which means they’re easy to override. Placing packer_version in defaults instead of vars is desirable because now it is easier to override when you want to install an older or newer version of Packer:

---
- hosts: all
  sudo: yes
  roles:
    - { role: "azavea.packer", packer_version: "0.7.0" }

All of that said, we’re set with packer_version in defaults, so the vars directory is not needed either.

files

files is where you put files that need to be added to the machine being provisioned, without modification. Most of the time, files in files are referenced by copy tasks.

The Packer role has no need for files, so we’ll delete that directory.

handlers

handlers usually contain targets for notify directives, and are almost always associated with services. For example, if you were creating a role for NTP, you might have an entry in handlers/main.yml for restarting NTP after a task finishes altering the NTP configuration file.

Packer isn’t a service, so there is no need for the handlers directory.

meta

meta/main.yml houses one of the biggest differences between includes from roles: metadata. The metadata of an Ansible role consists of attributes such as author, supported platforms, and dependencies. Most of this file is commented out by default, so I usually go through and fill in or uncomment relevant attributes, then delete anything else.

For the Packer role, I trimmed things down to:

---
galaxy_info:
  author: Hector Castro
  description: An Ansible role for installing Packer.
  company: Azavea Inc.
  license: Apache
  min_ansible_version: 1.2
  platforms:
  - name: Ubuntu
    versions:
    - trusty
  categories:
  - cloud
  - system
dependencies:
  - { role: "azavea.unzip" }

Ignore the dependencies bit for right now. We’ll come back to it later.

tasks

tasks houses a series of Ansible plays to install, configure, and run software. For Packer, we need to download a specific version, and since it’s packaged as a compiled binary in a ZIP archive, extract it. Accomplishing that with Ansible’s built-in get_url and unarchive modules looks like this:

---
- name: Download Packer
  get_url: >
   url=https://dl.bintray.com/mitchellh/packer/packer_{{ packer_version }}_linux_amd64.zip
   dest=/usr/local/src/packer_{{ packer_version }}_linux_amd64.zip

- name: Extract and install Packer
  unarchive: src=/usr/local/src/packer_{{ packer_version }}_linux_amd64.zip
             dest=/usr/local/bin
             copy=no

templates

templates is similar to files except that templates support modification as they’re added to the machine being provisioned. Modifications are achieved through the Jinja2 templating language. Most software configuration files become templates.

Packer takes most of its configuration parameters via command-line arguments, so the templates directory is not needed.

Conclusion

Congratulations! You now have all of the components necessary for an Ansible role. In part two of this series, we’ll take a look at creating a small playbook to apply the role against a local virtual machine. We’ll also take a closer look at the dependencies listed in the role metadata.

Google Summer of Code – A GeoTiff reader for GeoTrellis

Applying to Google Summer of Code

I first heard of Google Summer of Code (from here on GSOC) when a
former student at my university in Stockholm told our class how he
nailed a job at Google. He said that he performed very well in a
competitive programming tournament and that he also had done GSOC for
the Python Software Foundation. I was already trying out competitive
programming and had zero experience of working with open source.

When the GSOC 2014 season started and the accepted organizations were
announced I decided to do something that few people would apply to,
i.e. not submitting to the Twitter Open Source Organization, to
increase my own chances to join the program. I submitted 3 proposals,
one was writing a GUI test library for OWASP ZAP, a desktop program
written in Java for testing attacks on a web server, one was a graph
format exporter for Bio4j and the last one was the GeoTiff reader for
GeoTrellis.

The first one, the OWASP ZAP GUI test library, seemed the most boring
one and was in Java, but the guys maintaining it was very
friendly. The second one was supposed to be in Scala but was changed
to Java. I really wanted to learn more about Scala and Functional
Programming in General and when I got accepted to all 3 proposals I
talked to Rob Emanuele, who later has been my mentor during GSOC,
which instantly told me that if I want to do Scala this summer,
GeoTrellis was the way to go!

Preparations

It was both exciting and a little bit frightening to work with Scala
when I haven’t written anything in Scala before, and truth be told, it
isn’t as easy as Java at all. Rob recommended the Coursera course for
Scala and I did the whole thing, it was great. I had a lot of stuff in
school so I actually didn’t prepare too much for the actual project,
except exploring the GeoTrellis source code for a bit. I also found
some specifications for Tiff and GeoTiff and tried to read those, but
I didn’t understand too much. I also got a book from Azavea, which was
about Rasters and Map Algebra, which was a very good read for this
project.

Start to Midterms

GeoTiffs are essentially Tiff files with a few add-ons; GeoTiffs are a
superset of Tiffs. I started reading the Tiff 6.0 specification, and
since that was written in 1992 it felt a bit outdated and hard to
interpret. But I worked hard and tried to read in all the tags (Tiff =
Tagged Image File Format) and all the extra stuff that GeoTiff brought
in to the picture. It went pretty slow because I was both learning to
use Scala and getting familiar with working in a larger group of
developers, with a rather big codebase. I got a lot of help by my
mentor Rob and he read and commented my code on Github, making stuff a
million times easier.

Midterms to End

After the midterms I had tried to do some decompressions and I also
did a pull request for fixing a locale bug (the dreaded comma vs dot)
in the whole of GeoTrellis. From here on stuff got more easy and with
the help of Rob I really started to get things done. Today the reader
supports all of the Tiff 6.0 specification decompressions except JPEG
and also works fine with ZLib. The reader is now used in other parts
of GeoTrellis and it is really nice to see that something I have
created is used by others.

Summary

I will continue after the GSOC 2014 season is over to work with
GeoTrellis and further improve the reader and also create a GeoTiff
writer. I look very much forward to doing this and I’m very grateful
for both the program, the people at Azavea and my mentor throughout
the program.

Batch District Matching Using the Cicero API with OpenRefine

OpenRefine (formerly Google Refine) is an awesome open source tool for working with data. If you haven’t heard of it before, in the words of Christopher Groskopf, “”Once you’ve clustered and reconciled your crufty public dataset into a glistening gem of normality you won’t know how you lived without it.”

Even if you have a dataset that’s useable already though, you might want to add more data to it. This is often why clients come to us for Cicero batch processing and district stamping. Clients can give us a spreadsheet of data with street addresses, often a list of supporters or members exported from their CRM system. Then, we can use the expansive database of elected officials and political districts that underpins our Cicero API to process these large batch processing jobs, geocoding and providing official and district information for each record.

However, one of the cool things about OpenRefine is that you can use it yourself to perform similar batch processing tasks with external APIs, like Cicero! In this blog post, we’ll use OpenRefine to add Philadelphia city council district information to an open government dataset of all Charter School locations in the city. Why charter school data? Whether you’re for or against them, there’s no question that charter schools are a tough local political issue being debated by communities across the country. Using OpenRefine and Cicero to determine the council districts of each charter school in Philadelphia would enable us to determine how many charter schools are in each councilmember’s district. That would be useful information to make councilmembers aware of if we were conducting local advocacy work on the merits or drawbacks of this educational approach. With 84 charters in the city, too, this would be a laborious task without OpenRefine!

We’ll start by downloading the zipped CSV file from the School District of Philadelphia’s Open Data Initiative site, which can be found through OpenDataPhilly. We see that the file has a few key fields we’ll be using to interact with Cicero – address, zip code, city and state.

Mmmmm, tabular data.

Mmmmm, tabular data.

(more…)