In my last blog post I introduced the idea of type classes (and scala.math.Numeric in particular). I demonstrated basic usage but also alluded to some problems. In this blog post I will explain the problems, talk about specialization, and introduce a version of Numeric that solves many of the problems (com.azavea.math.Numeric). I will also present some benchmarking numbers.
Icky Syntax
The most noticeable problem when using Numeric (at least in 2.8) is the syntax. As Kevin Wright alluded to in a comment on the last post, you can include the Numeric type class in two ways: via a context bound on your type parameter (A) or as an implicit argument.
In my earlier examples I used the implicit argument because I thought it was easier to follow (especially for those who are new to implicits) and it bound the instance of the type class to a named variable (n). But you can also use the type bound syntax, which makes the function declaration a bit cleaner looking.
// this makes it obvious that n is an instance of Numeric[A] // and gives us a reference to use. def square[A](a:A)(implicit n:Numeric[A]) = { n.times(a, a) } // this definition is a bit cleaner looking. // implicitly[] searches for an implicit Numeric[A] instance. // unfortunately the body of the function is less clean. def square[A:Numeric](a:A) = { implicitly[Numeric[A]].times(a, a) }
Either way, you’re still writing things like n.times(a, a) rather than a * a. Fortunately, Scala 2.9 introduced some nice new implicit parameters that allow us to use infix operators:
import Numeric.Implicits._ def square[A:Numeric](a:A) = a * a
Confusing Conversions
The implicits providing infix operators are definitely a big improvement! But while they are nice for simple cases like the previous one they can end up confusing the issue a little bit:
def twice[A:Numeric](a:A) = a * 2 //error: could not find implicit value for parameter num: // scala.math.Numeric[Any]
What’s going on here? The problem is that * is mapped to Numeric.times(a:A, b:A) but the second argument we provided isn’t an A, it’s an Int. In Java (and Scala) we are used to being able to mix numeric types like double and int without worrying: the result will be promoted to double (the type with the widest range). Unfortunately, as written Numeric isn’t able to do these kinds of things and instead types must be converted explicitly. Numeric does provide two special methods, zero and one which allow you to get those values for all its types.
// use Numeric.one to get the correct type def once[A:Numeric](a:A):A = a * implicitly[Numeric[A]].one // convert 2 to be the correct type def twice[A:Numeric](a:A):A = a * implicitly[Numeric[A]].fromInt(2) // convert a to be an Int def thrice[A:Numeric](a:A):Int = a.toInt * 3
There’s also no good way to combine two different generic Numeric types, other than converting to a common concrete type (e.g. Double). You could imagine Numeric providing something like fromType[B] (as well as things like fromDouble) but currently it doesn’t.
def combine[A:Numeric, B:Numeric](a:A, b:B) = a + b //error: could not find implicit value for parameter num: // scala.math.Numeric[Any] // this works def combine[A:Numeric, B:Numeric](a:A, b:B) = a.toDouble + b.toDouble // this would be nice (for some definition of nice) // but it doesn't work def combine[A:Numeric, B:Numeric](a:A, b:B):A = { a + implicitly[Numeric[A]].fromType[B](b) }
Performance
The most disappointing thing (in my opinion) about using scala.math.Numeric is its performance. In some cases it performs adequately, but in most cases it is very slow: actual algorithms written with Numeric tend to be 6-30 times slower than their direct counterparts (depending on what exactly they do). Some rough observations are that the comparison operators (e.g. >) seem to be slower than the math operators (e.g. *), and performance does depend on which type you end up using: Numeric[Double] does relatively well (compared to the same code written for Double). Numeric[Int] does very badly compared to code written for Int.
Specialization to the Rescue!
Fortunately, Scala itself contains the tool to fix Numeric’s performance problems: specialization. Specialization was first described in the paper Compiling Generics Through User-Directed Type Specialization by Iulian Dragos and Martin Odersky and was introduced in Scala 2.8 via the @specialized annotation. Other articles have been written explaining specialization, but I will give another brief one here.
When writing generic functions in Scala (as well as Java), the compiler generates one implementation that supports all possible argument types (e.g. java.lang.Object). This fine for Java [1] since all the valid types you could use with your generic function are guaranteed to be subclasses of java.lang.Object. If you remember my first post, you’ll recall how we had to create boxed instances of java.lang.Integer to hold int values when using generics.
Unfortunately for Scala it means that using generic functions with value-types (e.g.Int) is much slower than functions implemented directly in terms of the value-type itself, since there is a single implementation which requires its arguments to be objects (e.g. boxed).
The @specialized annotation allows the Scala compiler to generate separate implementations of the generic function: one for all the reference types (inheriting fromAnyRef) and one for each of the value types we choose to specialize over (e.g. Char, Double, Int). So if we have a generic function foo[A](a:A): A the compiler will generate other functions on primitive types, for instance fooInt(a:Int): Int, fooDouble(a:Double): Double, etc. And when we write foo(33) the compiler will translate that code into fooInt(33).
Since Scala hides boxing/unboxing, specialization doesn’t make the code look nicer. But it makes a huge difference in performance: Dragos and Odersky report a 22-40x speed up in micro-benchmarks!
Specializing Numeric
Soon after specialization was added to Scala 2.8, people began discussing specializing Numeric. Searching Google I find names like Jason Zaugg, Iulian Dragos, and others talking about it. Andreas Flierl emailed the scala-language mailing list on January 3, 2011 with some early tests which got me interested in working on this. So I don’t want to claim any credit for the amazing work done on implementing @specialized or even the idea of specializing Numeric.
Starting with Andreas’ code I began working on building a specialized Numeric, with the goal of making all its operations as fast as direct implementations. There were a bunch of ways that the code needed to be restructured (which I may discuss in a later article) and I learned a ton about implicits, typeclasses, and profiling Scala code. The code is available at: https://github.com/azavea/numeric along with the tests I’m running. The results follow, but the short story is that with the exception of implicit infix operators all com.azavea.math.Numeric code runs just as fast as the direct implementations.
The Numbers
| test | direct (ms) | new (ms) | old (ms) | ||
| from-int-to-int | 22.3 | 22.0 | 0.99x | 169.1 | 7.60x |
| from-int-to-long | 38.0 | 37.3 | 0.98x | 244.5 | 6.43x |
| from-int-to-float | 23.1 | 23.0 | 0.99x | 202.5 | 8.76x |
| from-int-to-double | 40.0 | 37.9 | 0.95x | 271.4 | 6.78x |
| adder-int | 12.6 | 13.5 | 1.07x | 145.3 | 11.50x |
| adder-long | 21.1 | 21.4 | 1.01x | 153.8 | 7.28x |
| adder-float | 27.5 | 27.8 | 1.01x | 27.5 | 1.00x |
| adder-double | 27.3 | 26.0 | 0.95x | 27.5 | 1.01x |
| array-total-int | 8.3 | 8.3 | 1.00x | 246.4 | 29.86x |
| array-total-long | 11.5 | 11.1 | 0.97x | 310.8 | 27.02x |
| array-total-float | 16.1 | 16.6 | 1.03x | 217.1 | 13.47x |
| array-total-double | 17.3 | 16.8 | 0.97x | 216.8 | 12.57x |
| array-rescale-int | 27.1 | 26.8 | 0.99x | 357.8 | 13.19x |
| array-rescale-long | 24.6 | 24.6 | 1.00x | 540.6 | 21.95x |
| array-rescale-float | 59.3 | 59.6 | 1.01x | 366.9 | 6.19x |
| array-rescale-double | 91.4 | 91.1 | 1.00x | 394.3 | 4.31x |
| find-max-int | 16.6 | 16.9 | 1.02x | 166.0 | 9.98x |
| find-max-long | 12.1 | 12.1 | 1.00x | 199.6 | 16.46x |
| find-max-float | 22.6 | 22.4 | 0.99x | 187.8 | 8.30x |
| find-max-double | 23.9 | 23.5 | 0.98x | 195.6 | 8.19x |
| quicksort-int | 155.9 | 820.1 | 5.26x | 929.0 | 5.96x |
| quicksort-long | 1105.0 | 1326.4 | 1.20x | 1238.6 | 1.12x |
| quicksort-float | 228.6 | 1493.0 | 6.53x | 1364.5 | 5.97x |
| quicksort-double | 251.0 | 1470.3 | 5.86x | 1306.3 | 5.20x |
| array-allocator-int | 68.4 | 73.3 | 1.07x | 42.6 | 0.62x |
| array-allocator-long | 101.9 | 102.4 | 1.00x | 143.3 | 1.41x |
| array-allocator-float | 87.4 | 82.5 | 0.94x | 146.8 | 1.68x |
| array-allocator-double | 82.9 | 83.0 | 1.00x | 162.4 | 1.96x |
| insertion-sort-int | 47.5 | 48.6 | 1.02x | 1463.0 | 30.80x |
| insertion-sort-long | 49.9 | 50.3 | 1.01x | 1440.6 | 28.88x |
| insertion-sort-float | 50.0 | 50.9 | 1.02x | 1658.6 | 33.17x |
| insertion-sort-double | 49.3 | 49.9 | 1.01x | 1850.5 | 37.57x |
| merge-sort-int | 275.6 | 293.0 | 1.06x | 870.9 | 3.16x |
| merge-sort-long | 288.3 | 215.0 | 0.75x | 842.9 | 2.92x |
| merge-sort-float | 186.8 | 206.5 | 1.11x | 901.3 | 4.83x |
| merge-sort-double | 197.6 | 218.8 | 1.11x | 895.5 | 4.53x |
| infix-adder-int | 20.5 | 57.1 | 2.79x | 125.9 | 6.14x |
| infix-adder-long | 20.8 | 62.3 | 3.00x | 250.6 | 12.08x |
| infix-adder-float | 20.9 | 33.9 | 1.62x | 176.4 | 8.45x |
| infix-adder-double | 28.8 | 32.9 | 1.14x | 189.0 | 6.57x |
Discussion
Profiling this was a little bit annoying–I had to write 4 direct implementations (one for each of the types I was interested in) plus two generic implementations (one using the old scala.math.Numeric and one using the new com.azavea.math.Numeric). I used scala.testing.Benchmark although Yuvi Masory recently suggested using https://github.com/sirthias/scala-benchmarking-template (which I still need to go check out).
Some things to note:
- Old Numeric does much better on floating-point numbers than on integral ones. Why?
- For some reason old Numeric is able to beat the direct implementation of the array-allocator test.
- Since scala.util.Sorting[T] uses scala.math.Ordering[T] (which isn’t specialized) it’s not possible to make Numeric any faster in this test.
- scala.util.Sorting[Long] is over 4x slower than Int, Float and Double in the direct implementation.
- Infix operators are still slow enough that I still have mixed feelings endorsing them.
One additional gotcha which I haven’t discussed at length is what writing your own specialized numeric code (currently) looks like. Unfortunately, there’s a lot of boiler-plate. While users of your library will just say doSomethingComplicated(Array(0, 1, 2), 18, 99) you will be stuck writing:
def doSomethingComplicated[@specialized A:Numeric:Manifest] (ns:Array[A], x:A, y:A) = { // implementation here }
And that’s only one type parameter; God forbid you want A and B to both be Numeric!
I don’t know of any easy way around this. You have to say @specialized if you want your code to be specialized. If you fail to do this anywhere in the call chain you will use generic implementations from that point down, losing any benefit to specialization. You need to specify Numeric because that’s the point, right? And when writing code that needs to use the type parameter A to allocate arrays (among other things) you have to provide a Manifest. This gets ugly really fast.
Given my ignorance of the Scala compiler, I can imagine various magical things that might fix this. For instance, I could imagine some kind of synthetic type bound that combines others, e.g. FastNumeric = @specialized Numeric:Manifest or something. Is this possible (or even a good idea)? I don’t know.
Changes
This section is a bit rushed–I may expand it into another post later, but I just wanted to get the information out there.
One of the biggest changes I made to Numeric was tearing out scala.math.Ordering–I didn’t want to try to specialize “everything at once” but scala.math.Numeric inherits all of its (incredibly slow) comparison methods from Ordering. Specializing Ordering would solve this problem.
I also created a specialized typeclass with the humorous name “Convertable” which applies to all the value types which represent numbers (Byte, Short, Int, Long, Float and Double). The great thing about this is that you can avoid doing boxing/unboxing during conversion between types. It also allows you to implement the fromType[T] method I discussed earlier.
I left out Short and Byte after Paul Phillips pointed out that neither classes operations behave as the direct implementations do. In Scala (as in Java) Byte + Byte -> Int, whereas in Numeric A + A -> A (so Byte + Byte -> Byte). This causes all kinds of bugs. Since Short and Byte (and Char) are converted to Int when you actually do things with them, I decided it was unlikely users wanted to be able to use them in generic numeric functions.
A (potentially) controversial change I made was not including scala.math.Integral and scala.math.Fractional, and unifying support for the division and mod operators. Coming from dynamically-typed languages, I expected to be able to do the following:
// this does not work with the built-in Numeric import Numeric.Implicits._ def scale[A:Numeric](n:A, num:A, denom:A) = (n * num) / denom // this *does* work import scala.math.Integral.Implicits._ def scale[A:Integral](n:A, num:A, denom:A) = (n * num) / denom
I understand that for someone who’s excited about creating user-defined numeric types this split isn’t a big deal. But from my perspective I want a generic numeric type that abstracts across all the useful value-types in Scala. I’m not opposed to having Integral and Fractional, but I would rather that Numeric not be defined in terms of them.
Finally I did some restructuring to avoid using inner traits and classes, since I had heard that those don’t specialize well.
Conclusion
While I still think there’s some room to make Numeric better, at this point it’s possible to write fast generic Numeric code in Scala. Stepping back, I think it’s incredibly exciting that I can avoid code duplication *and* maintain the speed that I’m used to from direct implementations.
I’m not sure what the best path is to add this capability to scala.math.Numeric. I think the trait lacks some important features (more type conversions, division, etc) and some of the restructuring is currently necessary to get these numbers. Since binary compatibility is important moving forward for Scala and the Typesafe team it’s not entirely clear what the best plan is. Hopefully some discussion and work at Scalathon can clarify things a bit.
Please respond with any questions or comments. Do let me know if you find problems in the profiling, or try it yourself and get dramatically different numbers. I spent some time trying not to fall into obvious micro-benchmark pitfalls, but there’s still a lot I don’t know about the JVM. Thanks for reading!




