Articles by
Josh Marcus

What the Heck is… Scala?

When we decided it was time to build a next generation version of DecisionTree, I started a research project  (with my 10% R&D time) to carefully evaluate the current state of the art in concurrent programming.  When I say “concurrent programming”, I am talking about two different but related concepts.  One way to make a computational task complete more quickly is to chop up the work that needs to be done into smaller parts and then to divide the work across multiple CPUs in a single computer (“parallel programming”), or to divide up the work across different computers (“distributed programming”).  During this research, I spent some time learning a new and very exciting programming language called Scala.  It was ideal for the cloud and multicore programming challenges we were facing, fulfilled our stringent criteria for a programming language, and was fun to learn and use while enabling us to be very productive.  All of these reasons led us to decide to use Scala as the core programming language for our next generation DecisionTree framework.  So what exactly is this programming language, and why did we choose it?  Why does the programming language matter, anyway?

Scala was created in 2001 by Martin Odersky.  Odersky wrote the modern Java compiler — Java is an extremely widely used programming language in the enterprise, especially popular because of the way it enforces “type safety” and the correctness of a programmer’s program.  While Java was designed to be state of the art in 1995 and to help programmers solve the problems they were facing at the time, when he began the work on Scala, Odersky wanted to take a few steps back and think about what kind of language could help programmers tackle the new types of challenges they were beginning to face: for example, high-level domain modeling, rapid development and concurrent programming.  With these goals in mind, he built Scala on the JVM, which means that organizations could use existing software libraries written in Java.  Now, Scala is being used to solve those problems, and has a quickly growing user base with some significant adopters who have needed its power, most visibly including companies like Twitter, FourSquare, LinkedIn, the Guardian, Novell, and companies in the UK and US financial services.

One of his core intentions when creating Scala was to make programmers happy — by making their work easier and more productive.  It’s very concise and eliminates the boilerplate code that you see in languages like Java and C#.  This means that programmers can focus on the logic of their problems — it’s like when you can think of the perfect phrase or metaphor that exactly captures the problem.  Some languages feel very heavyweight and verbose, but they offer safety assurances and  the high performance that you need.  While Scala has all of the same safety assurances and performance characteristics, it feels like a lightweight and elegant “dynamic” language.

Here’s a simple example, comparing Scala to Java.  Say we want to create a dictionary where we can use the English word for a number to find the actual number.  For example, we could use the word “one” to look up the word 1.  For numbers 1 to 3, this would look like the following in Java:


Map numberMap = new HashMap(); numberMap.put("one", 1);
numberMap.put("two", 2);
numberMap.put("three", 3);

In Scala, it looks like this:

var numberMap = Map("one" -> 1, "two" -> 2, "three" -> 3)

Scala is also very expressive, as it combines two different programming paradigms: object-oriented programming and functional programming.  While I can’t fully explain the two paradigms here, let me just say that most programming these days is in the object-oriented paradigm, but functional programming is having a powerful resurgence.  In functional programming, you “compose” your program with functions — the basic idea is that you are building something complex with simpler parts, and everything is the same kind of thing (technically, everything is an expression or function).  But these parts need to be entirely self contained (they can’t have side effects).

The name “Scala” itself is a combination of the words “scalable” and “language”, because Scala is a language that’s designed to be extensible — it’s a language that can grow and change as the needs of programmers change.  And one reflection of this is the support for the Actor Model that the developers baked in to the language that simplifies the developer of parallel and distributed programs.  Another advantage of this flexibility is that when other developers weren’t satisfied with the implementation of the “Actor Model” in Scala, they just wrote their own and other folks could use it as if it was baked into the language from the start.  This started the “Akka” project, which has now joined together with the core Scala team in a company called Typesafe, to provide  enterprise support and tooling for Scala and Akka.

This is just the tip of the iceberg in terms of Scala and why we chose it, but it’s worth mentioning that it is a very practical language.   We can use a wide selection of GIS libraries available in Java.  And Scala gives us enough control to optimize our code to run extremely fast.  (See Erik’s technical blog about his R&D work on high performance Scala.)

If you’re a programmer and want to check out Scala, I highly recommend checking out the Scala website and the Typesafe website and the Typesafe blog.  If you’re especially interested in concurrent programming with Scala and Akka, I recommend the Typesafe ‘getting started’ tutorial which will walk you through putting together a parallel implementation of an algorithm to compute the digits of Pi.

DecisionTree Unveils a Redesigned Interface

decisiontree_200wWe’re thrilled to announce the launch of the new version of our DecisionTree product. Over this past year, the DecisionTree team has made significant advances both in user interface design and in the architecture of our calculation engine, and it’s exciting to be ready to show them off.  If you want to check out what it looks like, we have both an Elections and Advocacy demo and an Economic Development demo.  Otherwise, read on to find out what we’ve changed.

If you’re not familiar with DecisionTree, take a look at our December 2008 newsletter to see an example of how the City of Asheville, NC has used it, or head over to the DecisionTree home page.  DecisionTree  is a set of innovative web-based planning and prioritization tools that can be used to help make geographic decisions.  In DecisionTree, users select and weight decision factors to find the areas that best meet the objectives of a project, be it siting a business, making real estate investments, improving service delivery, or optimizing direct-mail, political campaigns or fundraising efforts. And best of all, DecisionTree can be customized to leverage existing data and it’s simple and fast enough to run on the web.

DT_v2So what’s new? The interface has had a top-to-bottom makeover to make it easier to use both for first-time and expert users.

  • It now looks and feels more like a desktop application, with a ribbon-style interface along the top of the page that groups tools together with easy-to-identify icons.
  • We’ve added a splash screen that introduces the basic concept of choosing factors to create a priority map as well as a tour that walks users through the basic functionality of the site. The workflow has changed to a simple step-by-step process in a single window.
  • We’ve updated the styling and graphics to be more appealing as well as extremely customizable, enabling individual installations of DecisionTree to use colors, themes, and graphics that integrate well with organizations’ existing websites.

We’ve added several other features:

In terms of analysis, users can now limit the calculation to only a part of the map—such as a county or a tax incentive area—using a mask. They can also look at the individual priority map of each factor they’ve chosen, giving a better sense of how the composite map was generated.

Oh, and fellow geeks out there, you’ll be interested to know that there’s a lot of interesting new magic behind the scenes.  As software developers, we find DecisionTree to be a fascinating project to work on— it’s a distributed calculation engine that can split up individual requests across machines and processor cores to speed up each map calculation.  We’re continually improving the engine and making it easier to integrate into web applications.  Forgive my jargon here for a minute…  We used the Ruby on Rails framework to build a REST API to make it straightforward for other developers to build new user interfaces on top of the DecisionTree engine.  This interface is what Aaron Ogle, another Azavea developer, used to build the recently launched Walkshed application (see above) — definitely check it out if you haven’t yet.

We have two DecisionTree samples, one focused on elections in Philadelphia and another on economic development in the five-county Philadelphia region.  Take a look and let us know what you think!

AfricaMap: Azavea and Harvard Created a Web-based Search Tool for Exploration of a Historically Significant Collection of Maps of Africa

"Harvard University Geospatial Infrastructure (HUG) [was designed to] bring many... unconnected Africa datasets together in a single, easy-to-access web application that would promote collaboration and enable researchers to learn from other areas of study."

The most powerful tools often begin with a desire to solve simple, everyday problems. At Harvard University, faculty, students, and researchers often found that finding maps and spatial data related to their studies of Africa was extremely difficult. The issue was not that the data didn’t exist. In fact, the Harvard Map Collection has an impressive collection of historical maps of Africa. Many researchers also develop detailed Africa datasets in the course of their work, while other important spatial data is scattered across other organizations. But these maps and datasets had to be tracked down individually, assuming the researcher was even aware of them at all.

View of AfricaMap’s 1959 Ethnographic Map Layer Including Airfields

But at Harvard University’s Center for Geographic Analysis, Professors Suzanne Blier and Peter Bol with Senior Analyst Ben Lewis saw beyond simply creating a common repository for these maps. They envisioned a solution that would bring many of these currently unconnected Africa datasets together in a single, easy-to-access web application that would promote collaboration and enable researchers to learn from other areas of study. By layering the maps on top of each other, a researcher could explore all of the data or knowledge captured in maps from various disciplines. Knowledge of an area of interest could be deepened by maps describing historical, environmental, social, linguistic, or economic data. And by creating a map of scholarly projects focused on Africa, users could discover the work of others with interest in common geographical areas, despite differences in their fields of study. The vision for the Harvard University Geospatial Infrastructure (HUG) platform was born.

Ben Lewis developed an innovative, highly scalable, spatial search and display architecture to address these ambitious goals. By utilizing open standards and protocols, the framework would interoperate in the future with other technical systems used by scholars in various disciplines. By committing to an open source toolset and codebase, the framework could be applied to other areas of the world and be shared with other organizations who could, if they wanted, use and extend the framework. With the vision in place, he approached Azavea and MetaCarta to build the application.

MetaCarta built the map tiling system, while Azavea was asked to flesh out the framework with searching capabilities and some advanced features that presented fascinating technical challenges for us. One of our goals was to enable researchers to search through millions of places (ranging from populated places to physical features) using a straightforward text search, like a Google search, and have the results highlighted on the map. That work began by building a gazetteer, which is a geographic dictionary or directory — like a yellow pages for geographic place names. The initial data source for the gazetteer was the GeoNames database, which is a free geographic database of over eight million geographical names. Users can add or edit place names online, as if it were Wikipedia for place names. GeoNames doesn’t just include populated places like cities or villages, it also includes features such as farms, streams, wells, and schools.

Detail from a historic map of Africa by cartographer Jodocus Hondius,
circa 1612. “We were particularly struck by the beautiful juxtaposition
of the old and new: seeing an image of a sea creature on the edge
of a historical map, layered on top of a modern web map.” – Josh
Marcus & Reed Lauber

After processing and filtering GeoNames into a geographical database, we added two ways to view the places on a map. We added a ‘Places’ tab where one can view places by type. Here a researcher can turn on any combination of hundreds of ‘place types’ and zoom to any area of the map, clicking on’ place features’ for information about these places. In addition we made it possible to turn on all ‘place types’ and view them along with the many other layers in the system. We also made it possible to combine place name searches with ‘place type’ queries.

But first we had to tackle a technical challenge: Africa is a huge continent and there are a lot of places to show. The gazetteer has over three hundred thousand populated places, and we needed to figure out how to efficiently display all of them on the map. We profiled two open source mapping servers, one called GeoServer and another called MapServer, for the speed of image generation, and ended up with a conundrum. We preferred GeoServer’s cartography, especially for showing overlapping areas of scholarly study. But we were able to cajole MapServer to generate a map layer of hundreds of thousands of points very quickly. In the end, while we knew it would be somewhat unorthodox to use two mapping servers in the same application, we decided to use both systems for what they were good at: GeoServer for general cartography and MapServer to show the points from the gazetteer. The technology tools were rounded out with PostGIS for storing and searching feature queries, ExtJS for user interface components, and Python to tie it all together.

Once the gazetteer and visualization were in place, we were able to leverage the power of Web Map Service (WMS), an open standard to make web requests for maps, to transform the text search into a search for the geographic points in all of the map layers. The result of a text search could include features from the gazetteer, but also other searchable layers as well, such as the layer of scholarly projects. We added functionality so that a user could search many layers at once, click on the map to “drill down” and return results about different types of features in a single interface, and a range of other tools including a “permalink” that would enable users to share a particular view of the maps and search results with other users or students. The database also includes the geographic extent of research projects that target the African continent.

Now that the first phase of the project is complete, an initial public release of AfricaMap is available online and Azavea will be working with Harvard’s Center for Geographic Analysis to expand this project to apply the same framework to the Boston Metropolitan area. Take a look at the application for yourself. We especially recommend taking a look at the historical maps. We were particularly struck by the beautiful juxtaposition of the old and new: seeing an image of a sea creature on the edge of a historical map, layered on top of a modern web map. Click here to see an early 17th century map from Jodocus Hondius, and view other historical maps in the ‘Map Layers’ tab.

Another interesting way to explore the application is to put yourself in the mindset of a researcher. Imagine you are interested in the economic development of Freetown, in Sierra Leone. The ‘About’ tab has some useful documentation regarding searches and turning off and on map layers. The very detailed basemap (Freetown 2.5k) provides a strong basis for all of your work. By comparing other basemaps (the American Sierra Leone 50k map from the 70s and the Russian 500k transportation map from the 80s) you can observe new development over time. From there, you can draw in whatever other maps that are relevant for your work — some examples would include soils, population, language areas, ethnographic regions, or turn on the projects layer and explore other scholarly projects that have focused on your area. Enjoy!