Weighing Scalaz vs Cats Scala Libraries for GeoTrellis

By Colin Woodbury on December 27th, 2017

Azavea’s GeoTrellis library has been developed using the Scala language. Scala provides some elements of both functional and object-oriented approaches to programming. We selected it because it provided support for the functional approach, but as a hybrid between the two approaches, the core language is sometimes frustrating, particularly for people that have experience with “pure” functional languages. Over the past few years, two libraries, Scalaz and Cats, have been developed to provide more purely functional abstractions to the language. The GeoTrellis team recently considered the question: Should we use Scalaz or Cats?

Scalaz versus Cats Scala Libraries

Scalaz and Cats are libraries which provide Functional Programming constructs for Scala.
The move to adopt one or the other stems from a desire to reduce boilerplate and simplify
our API using community-understood Functional Programming concepts.

After a thorough research period that compared the two libraries in depth (something that
had apparently not been done before in the community), GeoTrellis has decided
to use the Cats library.

Below I’ll describe the reasons for our decision, and layout some recommendations
for usage should other Scala teams wish to tread a similar path.

1 The Decision

As the one who did the research, I got to know both libraries and both communities
fairly well. When it came down to my own vote, I was conflicted on which to choose.
The two libraries have similar APIs and comparable performance, and I found the
contributors to both libraries to be welcoming, hard-working, and intelligent.

I’ve seen where both libraries came from, where they are today, and where they’re going.
I have my own inner thoughts about the future of Functional Programming in Scala,
but as of today GeoTrellis is going with Cats for one major reason: Discoverability.

We could likely keep usage of Cats hidden in our internals, but more than likely
some of it will trickle up to the user-facing API. For instance, here is a new Layer
typeclass that we’re considering for GeoTrellis:

@typeclass trait Layer[F[_]] extends Functor[F[_]] {
  ...
}

Should a user investigate Layer in our Scaladocs, they would see Functor.
As the GeoTrellis authors, it’s then our responsibility to make sure that curious users
have immediate access to supplementary resources, should they want
to learn more. While both Scalaz and Cats have a wealth of learning materials,
we found that Cats has more approachable documentation “up front”.

Fortunately, even with things like Functor visible on top-level symbols, we’re
confident that the introduction of Cats and Simulacrum typeclasses will
greatly simplify GeoTrellis for both users and developers.

2 Usage Recommendations

2.1 Eq and Show

The typeclasses Eq and Show can supply immediate type safety guarantees
to Scala code. Eq exposes the type-safe equality operator ===:

scala> import cats.implicits._

scala> 1 === 1
res0: Boolean = true

scala> List(1,2,3) === List(2,3,4)
res1: Boolean = false

Unlike vanilla Scala’s ==, which can compare any two types for equality (even when
doing so is meaningless), Eq.=== will only compile if used on two values of the same
type. Enforcing this has two benefits:

Show exposes .show, a type-safe variant of .toString. While .toString
can be used on any type (even when doing so is meaningless), only types
for which stringification is meaningful have an instance of Show.
The benefit of this is primarily in avoiding subtle bugs.

Code quality analysis tools like Codacy consider usage of == and .toString
to be bad practice, and can potentially fail your CI if it catches you using them.

2.2 Semigroup and Monoid

These are things that are “fundamentally combinable”. Like Int under addition,
if you have a type that satisfies:

/* Arithmetic */
1 + (5 + 7) == (1 + 5) + 7

/* Your type */
a <> (b <> c) == (a <> b) <> c

then your type is a Semigroup. If your type also has some analogue
to 0 under addition:

/* Arithmetic */
x + 0 == x

/* Your type */
x <> zeroishThing == x

then your type is also a Monoid! By defining instances of Semigroup and
Monoid for your type, you can take advantage of a number of “free” operations
that are mathematically guaranteed to behave sanely.

2.3 Functor

Many Scala types have a .map method. If you’ve ever done:

val foo: Option[Int] = Some(1)

foo.map(_ + 1)  // Some(2)

val bar: List[Int] = List(1, 2, 3)

bar.map(_ + 1) // List(2, 3, 4)

then you’ve take advantage of the fact that Option and List are both Functor s.
Most “mappable” things are a Functor. By being honest about this behaviour and giving
our own types instances of Functor too, we can write clean, generic code, and also
utilize more interesting typeclasses that rely on Functor.

2.4 Foldable

If your type is a Functor, it’s almost certainly a Foldable too. Foldable
generalizes the idea of foldLeft and foldRight by using Monoid. It says
“if you give me a container full of Monoid things, I can crush them down
sanely into a single value”. My favourite operation is .fold (also aliased
as .combineAll):

val foo: List[Int] = List(1, 2, 3)

foo.combineAll  // 6

val bar: List[String] = List("My", "cat", "is", "named", "Jack")

bar.combineAll  // "MycatisnamedJack"

val baz: List[Option[Int]] = List(Some(1), Some(2))

baz.combineAll  // Some(3)

val boof: List[Option[Int]] = List(Some(1), Some(2), None, Some(4))

boof.combineAll  // None

2.5 Traversable

If your type is both a Functor and a Foldable it’s almost certainly a
Traversable too. Traversable exposes .traverse and .sequence, two
invaluable methods for handling “effects”.

.sequence “flips” nested effects:

val foo: List[Option[Int]] = List(Some(1), Some(2), Some(3))

foo.sequence  // Some(List(1, 2, 3))

val bar: List[Option[Int]] = List(Some(1), Some(2), None, Some(3))

bar.sequence  // None

.traverse accomplishes something similar, but is .map -like:

val foo: List[Int] = List(2, 4, 6)
val bar: List[Int] = List(2, 5, 6)

val f: Int => Option[String] =
  { n => if (n % 2 == 0) Some(n.show) else None }

foo.traverse(f)  // Some(List("2", "4", "6"))

bar.traverse(f)  // None

Note: If you ever see foo.map(f).sequence in code, this can always be replaced
with foo.traverse(f), which would be much more efficient.

All of these examples used List and Option, but of course there are many other
combinations. Most of the vanilla Scala collections can be used and combined
in this way.

2.6 IO

IO is not a typeclass, it’s a normal data type. It’s power comes from
segregation of side-effects, which are usually allowed anywhere in Scala.

One of Scala’s weaknesses is that it’s not referentially transparent. Any
function/method in Scala can perform input/output or mutable global state.
This means that all uses of vals/vars are assignments and not declarations
of mathematical equality:

def foo: Int

val x: Int = foo

Here x and foo are not referentially transparent (they are not equivalent in
the mathematical sense – one can not be replaced with the other at use-sites).
This means the following two lines are not the same:

val a: Int = x + x

val b: Int = foo + foo

Even if a == b! Why? Scala allows side-effects to be performed anywhere, so
foo could be defined as:

def foo: Int = { println("hi!"); 1 }

Looking at the exposed API and not the code, the user has no idea what lurks
under the covers of foo. If we now draw back the curtains on our a-b example
above, we see:

// also prints "hi!" once the first time `x` is evaluated.
val a: Int = 1 + 1

val b: Int = { println("hi!"); 1 } + { println("hi!"); 1 }

The real-world effects of this are two-fold:

  1. Optimization/inlining becomes hard for the compiler, since you can’t guarantee
    behaviour of any functions ahead of time.
  2. Users and devs can’t trust APIs – there’s no way to know what a function really does
    until you look right at the code, which is a huge failure of abstraction and generally
    a waste of people’s time.

The IO type from cats-effect helps with this. It asks us to be honest about which
parts of our code are effectful and which aren’t:

/* Read some runtime configuration. Application secret keys, maybe? */
def readConf(path: String): IO[Conf] = { ... }

/* Activate your database */
def initDB(conf: Conf): IO[DBHandle] = { ... }

/* Perform some query */
def lookup(h: DBHandle, query: Query): IO[Foo] = { ... }

/* Some pure transformation. No IO! */
def transform(foo: Foo): Foo = { ... }

def work(args: Array[String]): IO[Foo] = {
  val path: String = ??? // from args somehow
  val query: Query = ???

  /* (>>=) is the canonical alias for `flatMap` */
  readConf(path) >>= initDB >>= { lookup(_, query).map(transform) }

  /* Equivalent:
   for {
     conf <- readConf(path)
     hand <- initDB(conf)
     foo  <- lookup(hand, query)
   } yield transform(foo)
   */
}

def main(args: Array[String]): Unit = {
  work(args).attempt.unsafeRunSync match {
    case Left(err)  => ... // handle the error safely
    case Right(foo) => println(s"Success: ${foo.show}")
  }
}

If all side-effects are contained to things marked with IO, then we know
that strange runtime errors could never come from pure functions like
transform. Luckily, IO also catches exceptions for us similar to Try,
and lets us handle them gracefully as seen in main.

Unlike Haskell, Scala does not force usage of IO in applications. So,
its usage would have to be a “best practice” on the team. To help with this,
a future version of the scalafix linter will ban usage of side-effectful
functions like println in methods that do not return in the IO type.

Allow me to be frank: if we find ourselves having thoughts like “escape hatches
are unavoidable in real code” or “oh, it couldn’t hurt to just slip some
innocent file reading in here…” we must stop ourselves. We are being lazy and
our design is almost certainly incorrect. For the sake of the future sanity of
both us and our colleagues, we must take a step back and rework things to use
the IO type. The result will be much cleaner, I promise you. Likewise, if we
hear colleagues utter the sentiments above, we should douse them in Holy Water
and show them this article.

2.7 Defining Typeclass Instances

When defining a “typeclass instance” for your type, please do so in that type’s
companion object:

import cats._

case class Pair(a: Int, b: Int)

object Pair {
  /* via the "kittens" library */
  implicit val pairEq: Eq[Pair] = derive.eq[Pair]

  implicit val pairSemi: Semigroup[Pair] = new Semigroup[Pair] {
    def combine(x: Pair, y: Pair): Pair = Pair(x.a + y.a, x.b + y.b)
  }
}

Not doing so is called writing “Orphan Instances”, which are a source of great
import confusion. Languages that have first-class typeclass support throw
compiler warnings when you write orphans, so please believe me that it’s an
anti-pattern.

The kittens library can be used to automatically derive instances for the standard
typeclasses.

2.8 Libraries for Developer “Standard of Living”

The following libraries can do wonders for a Scala developer’s
“standard of living”:

  • Kittens: Automatic derivation of typeclass instances
  • Decline: Easy command-line options
  • Circe: JSON Encoding/Decoding
  • Atto: Dead-simple string parsing

3 Conclusion

The bounds of Functional Programming go beyond what I’ve described, but exploration
of just what’s here is enough to bring more order and simplicity to your Scala
projects. Developer happiness will go up, and the costs of debugging will go down.
Why not try?

Happy lambdas!