The GeoTrellis project just published the official 3.0.0 release. This release contains significant feature additions and API changes. We focused on features that specifically enable and support Cloud Optimized GeoTiff (COG) workflows.
This is a major release following SemVer because there are API breaking changes. However, GeoTrellis 3.0 is not a library overhaul. For many users, no changes to existing codebases will be required to continue using GeoTrellis as before.
The most significant change is the introduction of a new interface,
RasterSource, which makes it easier to read raster data from a variety of formats and sources. It adds significant utility and performance to most GeoTrellis workflows. Think of it as enabling GDAL-like VRT features.
GeoTrellis 2.x workflows revolve around executing a distributed read of the raster tiles and using spark operations to convert them to desired resolution and projection. This is flexible but expensive pattern because it relies on shuffling large amounts of raster data over the cluster network to get it in right shape.
RasterSource API allows the user to create a view of a remote raster resource while setting projection, resolution and pixel grid alignment. The program may at a later time send this view across the network boundary and perform window reads into this view where the underlying mechanism will perform the required conversion from the source to requested window. We strived to make this interface as simple and direct as possible.
Critically, the RasterSource interface unifies various ways in which rasters are stored and read in GeoTrellis projects:
GeoTiffRasterSourceuses our native JVM GeoTiff reader
GeoTrellisRasterSourcereads GeoTrellis Avro catalogs
GDALRasterSourceuses GDAL library through JNI bindings
MosaicRasterSourcecomposes multiple RasterSource instances to build a simple mosaic view
This GitHub issue describes many of the goals in more technical detail.
For the 3.0 release, it was a top priority to enable efficient use of GDAL through GeoTrellis and to integrate it with the new RasterSource API. This feature is delivered in the new “
org.locationtech.geotrellis %% geotrellis-gdal % 3.0.0” artifact. Finally, you can read every supported raster format directly in GeoTrellis without a separate translation or ingest step!
Because it is critical that
GDALRasterSource behaves predictably and reliably in multi-threaded environment, we created our own GDAL bindings which negotiate this mismatch between GDAL library and JVM applications. This native library and its JNI bindings are packaged in a jar and allow GeoTrellis users to use vanilla GDAL packages directly, without special installation steps required by traditional GDAL JNI bindings.
In addition to RasterSources, significant changes have been made to the package structure to reduce complexity through more intuitive naming conventions and decomposition. Additionally, there are a number of new features, bug fixes, and performance improvements. This post provides an overview of the most important improvements, but you can find the complete list of changes here.
Increasingly GeoTrellis users are finding it helpful to write GeoTrellis applications that do not require Apache Spark-like REST services, like those that back Raster Foundry. To support this use case in GeoTrellis 3.0, we introduced a more clear separation between functionality that does and does not require Spark as a dependency. We tried to combine this with efforts to simplify the package structure, naming conventions, and imports.
Functionality related to tiled layers now lives in the
geotrellis.store package contains interfaces for reading and writing GeoTrellis layers. Each backend package has been split in twain, one half requiring Spark dependency and one not. For instance
geotrellis-cassandra artifact contains the
geotrellis-cassandra-spark artifact contains
This increased modularity should enable the GeoTrellis library to be used in a wider range of use cases and enable integration with other distributed and stream processing frameworks.
One of the features of GeoTrellis from version 1.x to 2.x is Scala friendly wrappers around JTS geometry classes. This wrapper provides type safety on JTS operations, which otherwise often result in Geometry type (which could be null), and API consistent with Scala conventions.
However, in GeoTrellis 3.0 we have decided to remove this API wrapper and use JTS types directly through the library. Increasingly GeoTrellis code intersects with Spark DataFrames where
org.locationtech.geomesa:geomesa-spark-jts project provides the Geometry UDT. Requirements to constantly wrap, unwrap, and keep track of two geometry types in such applications quickly become overwhelming. Additionally, geometry objects are often light, sometimes containing only a couple of coordinates and can number in the millions. In high-throughput applications, wrapping every instance of geometry in an adapter class places significant burden on the garbage collector.
While this is a major API change, we are confident it actually reduces complexity, given the foundational nature of the JTS library itself, and improved interoperability and performance.
In GeoTrellis 3.0 we have removed
geotrellis.spark.etl package from the library. The code is still available here but we will not be actively maintaining it further. Should JSON based configuration driven pipeline be a major requirement for you, please look at
geotrellis.pipeline package which performs this role in a more compassable manner.
Overall we are striving to remove the requirement of ETL process as a prerequisite for working with raster data in GeoTrellis by providing powerful ways to read and co-register data through interfaces like RasterSource.
While not part of official GeoTrellis 3.0 release, we have continued to improve the
geotrellis-server project, which provides utilities for creating XYZ, WCS, WMS and WMTS endpoints using GeoTrellis code. This project now centers around usage of
RasterSource interface which enables it to source data both from GeoTrellis catalogs and COGs.
This project is still in the beginning stages and has some rough edges but will continue to be a focus for GeoTrellis team at Azavea in the future.
The priorities for the next GeoTrellis release are not yet set and are still being negotiated. However, some themes will certainly play a major role.
Integration of RasterSource at the core of all input operations throughout the library will continue, involving some major refactors. Additionally, the API simplification started in that interface will continue to propagate throughout the library. Some API inconsistencies still remain in GeoTrellis 3.0 between RasterSource and COGLayers. In the future, this should be reconciled through usage of the STAC standard.
The idea of working with a “virtual layer mosaic”, where the user is able to compose and work with metadata of the rasters and operations on those rasters delaying reads until the last possible moment will continue to be fleshed out.
Work on better underlying structure for raster data which can facilitate easy interchange between Python and JVM is already in progress, with attention towards the Apache Arrow format.
We did our best to update documentation with each change. However, there is always room for improvement with docs, so we expect there will be questions about migrating that will be revealed as users begin the process. Feedback and suggestions welcome here.
- GeoTrellis.io – Links to demos, GitHub, and selected projects
- GitHub – Issues, codebase, and documentation
- Gitter – Scala is hard. We can help. Come ask questions about your GeoTrellis project
- Twitter – We send team members to conferences, workshops, and share Big Data Open Source Geo project news.
- Email – Have questions about a project idea that could benefit from processing rasters at a scale? Reach out to us via email – we’d love to hear from you!