Betting Big on the Spatiotemporal Asset Catalog (STAC) Standard

Betting Big on the Spatiotemporal Asset Catalog (STAC) Standard

Azavea's Franklin logo, created for Spatiotemporal Asset Catalog (STAC).
Franklin, a new open source STAC and OGC API Features compliant server from Azavea.

A few weeks ago, several dozen software engineers gathered for the fifth “STAC Sprint,” hosted by IQT CosmiQ Works at their office in Arlington, VA. The work that was accomplished over those three days was exciting, to say the least–new libraries were announced, important decisions about the direction of the specification itself were made, and perhaps most importantly, the foundations of the community that will carry the STAC spec forward were reinforced. Azavea is proud to be an ongoing sponsor and participant in these events, and we’re particularly grateful to Chris Holmes for continuing to organize around this effort.

While the work that happened at the recent STAC Sprint was impressive, we believe it’s only the beginning of what will be a period of accelerating adoption across the geospatial engineering community more broadly. That’s why we’re dedicating a significant portion of our R&D investment to making adoption of the STAC standard dramatically easier in four key ways:

  1. Creating and interacting with STAC data;
  2. Dynamically serving and visualizing STAC data;
  3. Extending STAC to support machine learning workflows; and
  4. Validating the correctness of STAC data.

We believe that if there are free, well-maintained, openly licensed tools for each of these tasks, adoption of the STAC standard across industry and government will simply be a matter of time. As the universe of STAC-compliant datasets, services, and tools grows, we anticipate viral adoption of the standard to accelerate. But in order to reach that tipping point, the barrier to entry for getting started with STAC must be lowered, and that’s why we’ve decided to focus on the key challenges enumerated above.

Creating and interacting with STAC data – PySTAC

Earlier this month we released PySTAC 0.3, a Python library that allows users to read, write, and manipulate STAC data.For the full background on PySTAC, check out our official announcement of the library, which nicely covers both PySTAC’s goals and features. During the STAC Sprint, we spent time extending and improving PySTAC, and also had the chance to collaborate with others in the industry on applying it to real use cases. 

One such collaborator is Nick Weir, Senior Data Scientist at IQT CosmiQ Works and maintainer of the open source Solaris machine learning analysis toolkit: 

“We used PySTAC to create a STAC catalog for the SpaceNet 2 dataset, which includes both satellite imagery and the building labels. We were able to do so in a few short hours, including time spent getting familiar with the very intuitive, well-documented, and straightforward PySTAC API.”

– Nick Weir, IQT CosmiQ Works

Another STAC Sprint attendee, Kevin Booth from the Radiant Earth Foundation, had this to say about his experience with PySTAC:

“PySTAC allowed us to easily and quickly create catalogs for our machine learning training data without having to worry about the minutia of the STAC spec. PySTAC makes generating STAC catalogs as easy as requests makes making network requests.”

– Kevin Booth, Radiant Earth Foundation

Dynamically serving and visualizing STAC data – Franklin

At the STAC Sprint, we created Franklin, an easy-to-use server that provides a web service compliant with both STAC and OGC API Features. Skip straight to Franklin’s documentation or the associated Github repository if you’re ready to spin up an instance.

Using PySTAC, you can easily create STAC datasets of your own, like satellite imagery archives or training data for machine learning workflows. But once you’ve got a STAC dataset, the next step is often to build a web service on top of it so that you can easily visualize it and so that other applications can access it without having to copy all of the data. That’s where Franklin comes in.

You can get started with Franklin in just a few minutes:

  1. Make sure docker is installed on your machine in order to take advantage of the published franklin containers.
  2. To get started, simply follow these instructions and use the provided docker run command to start a live server already connected to an example STAC catalog hosted for free by the Radiant Earth Foundation on their ML Hub (special thanks to Kevin Booth for helping make that happen).
  3. Once you’ve explored the example STAC service and are ready to point Franklin at your own STAC dataset, you can follow these steps to import that data and create a dynamic service.

As we continue to invest in more features for Franklin, like raster and vector tile serving, we intend to follow a simple constraint: launching the server should only ever require a single command. The goal behind Franklin is to make the experience of building an application on top of STAC data approachable for the broadest audience possible.

Extending STAC to support machine learning – Label Extension

One of the benefits of STAC is that it’s designed to be extensible–developers can contribute “STAC Extensions” to make the standard more useful for their particular needs (a full list of candidate extensions is available here). One such extension we are investing heavily in is the Label Extension, which graduated from a Proposal stage to a Pilot phase during STAC Sprint 5.

At Azavea, one of the fastest-growing parts of our business is custom machine learning work. Creating and organizing training data is the lion’s share of the work of creating a new machine learning model from scratch, and STAC gives us the ability to standardize how we represent data for all of our projects and across all of our tooling for that work. For instance, our internal annotation tool allows users to export labels as STAC-compliant datasets which can be fed directly into Raster Vision, our library for training deep learning models on geospatial data. STAC has become the standard by which all of our development efforts can communicate with each other, making us more efficient over time without handcuffing to proprietary, idiosyncratic design patterns.

Screenshot of the Annotate tool exporting a STAC dataset
Exporting a STAC dataset from our internal annotation tool.

Validating the correctness of STAC data – Next up

Without the ability to validate that a dataset is STAC-compliant, the challenge of adopting the STAC specification becomes extremely tedious. Further, in the future, excellent validation tooling will be a prerequisite for STAC compliance to become the core assumption that allows a large ecosystem of tools and services to integrate with each other. We’re fans of the stac-validator Python library from SparkGeo, and are avid users.

In the course of building and launching Franklin, we codified much of the core logic for validating the correctness of STAC datasets and surfacing inaccuracies in a human-friendly way. Breaking this functionality out into a separate Scala library is on our long-term roadmap. We use Scala’s type system to ensure that STACs we create are correct at the time of construction, instead of after they’ve already been persisted somewhere. We believe the more implementations of STAC that exist, and associated tooling that spring up around them, the better–whether an engineer is more comfortable pulling in a Python or Scala library, they should have options!

What’s next

Over the coming months, we’ll be continuing to invest in PySTAC, Franklin, and the Label Extension as part of our broader R&D investments in the STAC specification. All three of these efforts are open to outside contributions, and we welcome collaboration from the broader engineering community. The long-term goal of adding a STAC validation library written in Scala is also on our roadmap. We look forward to the next opportunity to participate in a STAC sprint and are excited to keep investing in the projects we have underway.