Introducing PySTAC: A Core Library for SpatioTemporal Asset Catalogs

Introducing PySTAC: A Core Library for SpatioTemporal Asset Catalogs

Azavea has released PySTAC 0.3, a Python library that allows users to read, write, and manipulate STAC data. Check out the documentation and GitHub repository!

STAC: The future of discoverable geospatial data

The STAC (SpatioTemporal Asset Catalog) specification was born out of the recognition that many organizations were independently solving problems around representing large collections of satellite and aerial imagery, as well as other similar formats such as SAR, DEMs, and point clouds. These organizations ranged from large imagery providers such as Maxar, Planet, and Hexagon, to data aggregators and consumers such as Azavea’s Raster Foundry platform and Google Earth Engine. The creation of STAC brought together these members of the geospatial community to create an implementation-focused standard that solves the core needs across all of our use cases. Read more about the origins of STAC in Chris Holmes’s blog post that announced the specification.

The vision of STAC is one where all geospatial data exposed and consumed on the web is accessible in a common, searchable format. Azavea believes strongly in this vision of the future, and we are integrating STAC as a fundamental component into our open source and internal tooling. Azavea has written previously about why we care about STAC and our participation in the community that is pushing the specification forward.  We are pleased to continue to work alongside many others in the community to increase adoption and lower barriers of entry to STAC.

Moving STAC adoption forward

While STAC is already being used and supported by many of us that produce and consume geospatial data, it’s still at the very start of usage and adoption. The easier it is to understand and use, the more use the community will be able to get from it and the more adoption will happen. This is why we invested in the future of STAC by implementing PySTAC, which provides core Python library functionality. 

PySTAC was initially developed as a contribution to sat-stac, developed by Matt Hanson as part of the awesome sat-util suite of tools for working with satellite imagery. Some of the changes required core architecture refactors, and after discussing with Matt, we determined it would make sense to release this under the more generally named PySTAC.

Want to work on projects with a social and civic impact? Learn what it’s like to work at Azavea.

Visit our career site

On the STAC homepage, answers to the question “Why STAC?” includes the following: “The STAC community has defined this specification to remove … complexity and spur common tooling.” Our goal with PySTAC is to be part of this answer by providing a core building block to the Python STAC ecosystem. We developed PySTAC to be a stepping stone in the development of more Python common tooling and to accelerate the vision of STAC becoming reality.

PySTAC: Simple and complete

PySTAC aims to be a simple and complete Python encoding of the core STAC specification. This includes types for Catalogs, Collections, and Items, a way to traverse a STAC’s linking mechanism, and serialization/deserialization. The core STAC is implemented for STAC 0.8.1, and PySTAC also includes STAC extensions for machine learning training data, earth observation data, and Single File Catalogs. It also encodes concepts contained in the best practices document to both encourage the canonical usage of the specification as well as allow users to easily write STACs that fit to the best practices determined by the community.

STAC and Python logos.

PySTAC is an attempt to lower the barrier of entry for Python developers in using STAC. This should happen not just by providing a lot of the nuts-and-bolts functionality for dealing with STAC, but also by guiding understanding of the STAC spec via familiar Python concepts and real implementations. Studying the STAC JSON schemas and specification markdown can be challenging, so we provide a lot of documentation that should help Python developers better understand and get the most out of STAC.

PySTAC aims to be a virtually dependency-free implementation of the core concepts of STAC; we designed it so that almost nothing outside the STAC specification is included in the library. That way anyone familiar with the STAC specification should be familiar with the language and concepts in PySTAC, and anyone who learns how to use PySTAC will become well versed in the concepts of the specification. We hope that keeping PySTAC specifically scoped to the fundamentals will open itself up to more uses in the community.

The road ahead for PySTAC

We’re looking to expand the completeness of PySTAC in future releases. This includes providing complete implementations of all extensions, as well as integrations with projects that implement the STAC API. We encourage users and developers to contribute their own ideas and code to the repository so that PySTAC becomes a community-driven project much like STAC.

We’re excited to collaborate with the open source community of developers working on STAC tooling to help lay the necessary foundations that will lead us into the future of geospatial data interoperability. If you have ideas or questions, please open an issue on the repository or come say hello in the Gitter channel!