What is STAC?
The SpatioTemporal Catalog (STAC) is an open standard for exchanging catalogs of raster and vector data. The goal of the standard is to increase “the interoperability of searching for satellite imagery.” The potential applications of the analysis of satellite imagery are far-reaching. Yet, few are engaging with the multitude of data available.
A major impediment is the difficulty of searching and working with the data–the variety of formats and descriptions can flummox even the most experienced of users.
An unpredictable environment
Why do we need STAC?
I asked James Santucci, a software engineer from Raster Foundry, to describe the need for interoperability. He told me about a recent redistricting conference he attended. The conference convened data scientists and activists. The activists brought their advocacy skills and the engineers brought a trove of vector data. The opponents of extreme gerrymandering, well-versed in political fighting, had far less experience working with the technology.
While eager to explore the data, inconsistent file formats and metadata made the experience challenging. So challenging that some were unable to clear the first hurdle in manipulating their data: looking at it.
While the data are different, potential users of satellite imagery face similar obstacles. The STAC community hopes to create an ecosystem where users are able to locate and use the data they need. Even lay users will be able to discover relevant spatiotemporal assets if they are well indexed and consistently formatted.
Why Azavea supports STAC
Azavea fosters open-source software and standards for geospatial data. We are proud to be a part of the STAC community and contribute to this incredible resource. Raster Foundry, Azavea’s product for organizing and analyzing satellite and aerial imagery, is a locus of our work to advance STAC. We’re currently working to make it more compatible with the specification.
Evolving a taxonomy
The goals of the sprint were to:
- create more catalogs,
- improve software tools,
- enhance the specification.
James Santucci and Aaron Su (also a Raster Foundry software engineer) attended to share Azavea’s work and learn what partners such as Planet, The Climate Corporation, Radiant Earth Foundation, Astraea, Element 84, and CosmiQ Works are developing.
Aaron (Engineer in the Wild)
At Azavea, we have software and tools that can manage, generate, train, and visualize machine learning data quickly and accurately. Working with a group of engineers with similar interests, I contributed to one of the more visible catalog additions to emerge from the sprint: support for a new Label extension. My team also put up a pull request addressing missing components in the Label extension. The PR was merged after the sprint. Finally, we added a new feature in Raster Foundry to export STAC-compliant label data items, sources, and assets to support the new extension.
I also gave a lightning talk about Azavea’s machine learning workflow and how STAC fits into it at a happy hour hosted by Climate Corporation. During this talk, I introduced the STAC community to Raster Foundry, Annotate, and Vision — Azavea’s machine learning applications.
James (Remote Cataloger)
I worked on a server that accepts STAC items as posts and returns a TMS tile layer. While tiles.rdnt.io already does a nice job with COGs when you have a URL to point to, I wanted something that met two requirements:
We’re helping organizations extract insights with machine learning.
- I wanted to be able to serialize and deserialize the core of the STAC spec with type safety
- I wanted to be able to use Azavea’s GeoTrellis and GeoTrellis Server for serving tiles
The last requirement may be a shameless plug. But, Azavea has done a lot of work in the last few years to make serving tiles to different sorts of consumers easier. By proving a few simple things about your data (the properties named by these .scala files ), you can render them
The initial work focused on getting out TMS tiles by posting a STAC item:
And it went pretty well! In particular, GDAL does a great job caching parts of the .tif that it’s already read. After that initial read, things are pretty snappy.
This was my first time joining a STAC sprint, and I had a great time! Each day had a clear and feasible goal. This made the working groups and discussions productive and efficient. On the first day, each contributor expressed what they were most interested in, then chose one or more sprint topics to take on individually or on a team. STAC spec and ecosystem beginners were also given an introduction on the first morning. Daily retros served as a friendly opportunity for cross-group discussion.
Working in the Label extension spec improvement group was a valuable learning experience and opportunity to work in a team. The group was comprised of software engineers and data scientists who work regularly with label data, including Nick Weir from CosmiQ Works (Spacenet), Dave Luo of Anthropocene Labs and World Bank (OpenDRI), and Phil Varner from Astraea.
Matthew Hanson (Element 84) and Seth Fitzsimmons also provided helpful pointers. Though we usually work with different label formats, at STAC we worked together and found common ground. We suggested changes to the proposed Label extension after plugging and playing with our real-world data. The working group was a success, producing:
- a PR full of fruitful discussion and experiment results,
- STAC catalog examples,
- and an implementation in a production web application (Raster Foundry).
The most challenging aspect of the sprint was the remote component. Remote collaboration will always be difficult for short code sprints. Even more so when they involve people who don’t normally work together. As a remote attendee, I wasn’t sure how to find collaborators or how to make my work visible to others. This meant I wound up working on my own.
Overall the experience was very positive. The current work with STAC aligns well with the challenges we’ve tried to solve with Raster Foundry. Examples include work involving a visualization extension to STAC (which would allow shipping visualization instructions with geospatial assets
Thanks to the extensions, I also considered how to handle heterogeneous mixtures of JSON for the first time. This problem occurs when you have a flexible schema and want to get as much information out of it as possible. If there are five possible extensions for the “properties” field in a STAC item, then there are ten different two-extension schemas, ten different three-extension schemas, five different four-extension schemas, and one five-extension schema. Writing a decoder by hand for one of 26 different record shapes sounded tedious. It might as well have been an invitation to throw up our hands and toss unparsed JSON values around. Working with a colleague, I came up with a strategy involving shapeless’s records that the team is excited to attempt in the future.
STAC at work: the “Azavea-sphere”
Work to use STAC to support data exchange across the Raster Foundry ecosystem is ongoing. Our initial focus is on exporting STAC catalogs from Raster Foundry data as frozen inputs to machine learning pipelines for Raster Vision.
After we can export STAC catalogs from Raster Foundry, we have two other goals. The first is updating our STAC importer to make viewing STAC catalogs from Raster Foundry easier. The second goal involves Vision, our application for managing machine learning experiments. Vision is still in development, but we hope to make it public soon. We plan to use Vision to produce a feed of STAC-compliant vector data for predictions. We’ll be sure to post updates on our progress as it happens.
STAC sprints allow organizations to ensure that the services and products we build can communicate with each other, even without prior coordination. The fifth sprint will take place November 5-7 in Arlington, Virginia and will be a combined effort of the STAC and OGC communities. We’re sending Aaron again to represent Azavea. Drop us a line if we’ll see you there!