When we decided it was time to build a next generation version of DecisionTree, I started a research project (with my 10% R&D time) to carefully evaluate the current state of the art in concurrent programming. When I say “concurrent programming”, I am talking about two different but related concepts. One way to make a computational task complete more quickly is to chop up the work that needs to be done into smaller parts and then to divide the work across multiple CPUs in a single computer (“parallel programming”), or to divide up the work across different computers (“distributed programming”). During this research, I spent some time learning a new and very exciting programming language called Scala. It was ideal for the cloud and multicore programming challenges we were facing, fulfilled our stringent criteria for a programming language, and was fun to learn and use while enabling us to be very productive. All of these reasons led us to decide to use Scala as the core programming language for our next generation DecisionTree framework. So what exactly is this programming language, and why did we choose it? Why does the programming language matter, anyway?
Scala was created in 2001 by Martin Odersky. Odersky wrote the modern Java compiler — Java is an extremely widely used programming language in the enterprise, especially popular because of the way it enforces “type safety” and the correctness of a programmer’s program. While Java was designed to be state of the art in 1995 and to help programmers solve the problems they were facing at the time, when he began the work on Scala, Odersky wanted to take a few steps back and think about what kind of language could help programmers tackle the new types of challenges they were beginning to face: for example, high-level domain modeling, rapid development and concurrent programming. With these goals in mind, he built Scala on the JVM, which means that organizations could use existing software libraries written in Java. Now, Scala is being used to solve those problems, and has a quickly growing user base with some significant adopters who have needed its power, most visibly including companies like Twitter, FourSquare, LinkedIn, the Guardian, Novell, and companies in the UK and US financial services.
One of his core intentions when creating Scala was to make programmers happy — by making their work easier and more productive. It’s very concise and eliminates the boilerplate code that you see in languages like Java and C#. This means that programmers can focus on the logic of their problems — it’s like when you can think of the perfect phrase or metaphor that exactly captures the problem. Some languages feel very heavyweight and verbose, but they offer safety assurances and the high performance that you need. While Scala has all of the same safety assurances and performance characteristics, it feels like a lightweight and elegant “dynamic” language.
Here’s a simple example, comparing Scala to Java. Say we want to create a dictionary where we can use the English word for a number to find the actual number. For example, we could use the word “one” to look up the word 1. For numbers 1 to 3, this would look like the following in Java:
Map numberMap = new HashMap(); numberMap.put("one", 1);
numberMap.put("two", 2);
numberMap.put("three", 3);
In Scala, it looks like this:
var numberMap = Map("one" -> 1, "two" -> 2, "three" -> 3)
Scala is also very expressive, as it combines two different programming paradigms: object-oriented programming and functional programming. While I can’t fully explain the two paradigms here, let me just say that most programming these days is in the object-oriented paradigm, but functional programming is having a powerful resurgence. In functional programming, you “compose” your program with functions — the basic idea is that you are building something complex with simpler parts, and everything is the same kind of thing (technically, everything is an expression or function). But these parts need to be entirely self contained (they can’t have side effects).
The name “Scala” itself is a combination of the words “scalable” and “language”, because Scala is a language that’s designed to be extensible — it’s a language that can grow and change as the needs of programmers change. And one reflection of this is the support for the Actor Model that the developers baked in to the language that simplifies the developer of parallel and distributed programs. Another advantage of this flexibility is that when other developers weren’t satisfied with the implementation of the “Actor Model” in Scala, they just wrote their own and other folks could use it as if it was baked into the language from the start. This started the “Akka” project, which has now joined together with the core Scala team in a company called Typesafe, to provide enterprise support and tooling for Scala and Akka.
This is just the tip of the iceberg in terms of Scala and why we chose it, but it’s worth mentioning that it is a very practical language. We can use a wide selection of GIS libraries available in Java. And Scala gives us enough control to optimize our code to run extremely fast. (See Erik’s technical blog about his R&D work on high performance Scala.)
If you’re a programmer and want to check out Scala, I highly recommend checking out the Scala website and the Typesafe website and the Typesafe blog. If you’re especially interested in concurrent programming with Scala and Akka, I recommend the Typesafe ‘getting started’ tutorial which will walk you through putting together a parallel implementation of an algorithm to compute the digits of Pi.
























