Last year we released PySTAC, an open source Python library for working with geospatial data encoded in Spatio-Temporal Asset Catalogs (STACs). Azavea has been using PySTAC ever since to power the STAC tooling that helps us manage geospatial data across a wide variety of projects. We’ve been improving PySTAC, keeping it up with changes to the spec itself, and rounding out the library with new extensions and features. In this post, we’ll catch you up with what’s new.
You can check out this post to get a full understanding of what the library is all about. But as a recap, here are the basics:
STAC is an open specification for metadata about geospatial imagery and labels. The objective of the project is to standardize the way that the earth observation industry stores data and empowers developers to build STAC-based tools like Franklin, sat-search, stac-browser, and PySTAC.
PySTAC makes it easy for python developers to work with STACs. Primarily, it is a tool to read STACS from JSON files, as well as to create and manipulate them in Python. It abstracts away the complexity of the STAC spec for Python developers so that they can concentrate on developing software and data science against geospatial data while always maintaining STAC compliance.
We built PySTAC because we know, first-hand, that designing systems for storing earth observation data on each project is a lot of effort. We believe that STAC is the future, and PySTAC makes it easy to consistently use it across software projects like Raster Foundry and Raster Vision. We open sourced the work with the hopes that it would make implementing software based on STAC easier for others, and to help contribute to the future of geospatial data interoperability through STAC.
STAC 1.0.0-beta.2 has been released! Over the last year, we’ve been keeping PySTAC up-to-date with STAC-spec version releases, culminating in the release of PySTAC 0.4.0 for STAC 0.9.0 last month and PySTAC 0.5.0 for STAC 1.0.0-beta.2 this month. Below are some highlights of the features and changes released with the most recent PySTAC versions. You can find a detailed list of changes in the PySTAC changelog.
PySTAC Catalogs, Collections, and Items now have
.validate() methods that will validate against the community-maintained schemas at https://schemas.stacspec.org, as long as the jsonschema dependency is installed. We also included functionality that makes it easy to validate STAC JSON from version 0.8.0-rc1 onward. It also allows for the ability for users to create their own custom validators and Schema URI maps to match and custom schemas.
In early versions of PySTAC, we composed extensions out of new classes that inherited from the STAC objects that they extended (e.g. the “eo” extension included “EOItem” and “EOCollection” classes). This implementation caused problems when an object included multiple extensions – this was specifically a problem when the “eo” extension split some of its properties into the “view” extension.
We refactored Extensions for the 0.4 and 0.5 versions of PySTAC in order to better support multiple extension implementations. All STAC Objects have an extensions property
.ext) that provides an interface to the object’s extensions for easy access, for example:
import pystac item = pystac.read_file(‘/a/path/to/an/item.json’) # Get the band information from the EO extension bands = item.ext.eo.bands # Get the off nadir angle from the View Geometry extension off_nadir = item.ext.view.off_nadir # Setting properties is done through this property as well item.ext.view.sun_azimuth = 40.3
See the documentation on the extension architecture for more information.
As mentioned above, we also added the CommonMetadata class to PySTAC. Common Metadata includes frequently used item and asset properties that are not part of the base spec. In PySTAC, common metadata behaves similarly to extensions: a common metadata property acts as an interface to the common metadata fields within an item. For example:
# Gets the GSD for the item item.common_metadata.gsd # Sets the GSD for the item item.common_metadata.gsd = 0.5
The Item Asset Properties feature released in STAC 1.0.0-beta.1 allows Assets to define values for properties normally defined at the Item level. This allows Item properties to vary based on what Asset the property is associated with. For example, if an Item represents Sentinel-2 imagery, it may contain an Asset for each of the band rasters. These rasters have different Ground Sample Distances, so a single Item-level value can’t accurately describe these assets. With the Item Asset Properties feature, PySTAC can read property values off of the Asset if the Asset defines its own value, or else fall back to the Item property value. For example:
# Get the GSD from the item item_gsd = item.common_metadata.gsd # Get the GSD from the asset “B1” from the item, if available asset_gsd = item.common_metadata.get_gsd(item.assets[‘B1’])
PySTAC functionality is useful, but only if you have the time or know-how to spin up a Python script using it. We’d like to expose functionality such as copying and modifying catalogs, validating STAC, and migrating versions into a command-line utility that will make it easy for anyone to run quickly and integrate into data processing pipelines.
We’ll be working on the start of the CLI at the STAC tooling sprint starting August 18th. If you’re interested in contributing ideas or code to this feature, we hope to see you there!
Another priority for upcoming releases is to increase the number of extensions that PySTAC supports. This will always be a moving target given that we expect the number of STAC extensions to grow with its user-base. In the short term, we would like to narrow the gap between the number of extensions present in the spec and those implemented in our library.
If you want to help out and contribute to PySTAC, a great place to start is by contributing an extension implementation. Check out this list of issues and let us know if you need help! We’d be happy to connect at the STAC tooling sprint to help you get spun up. We hope to see you there!