Highlighting NumFocus Projects: How Open Source Projects Empower Our Work

Highlighting NumFocus Projects: How Open Source Projects Empower Our Work

NumFocus is a nonprofit that supports open practices in research, data, and scientific computing. NumFocus provides a stable, independent, and professional home for open source projects that empower much of our work including pandas, Jupyter Notebooks, Matplotlib, and NumPy.

As a company that strongly advocates for open source, and provides some of our own open source tools, NumFocus’s mission strongly resonates with us. In this blog, we will highlight some of their projects and how they have supported our work.

NumFocus Projects that Support Azavea’s Work

Rasterio

We use a library called Rasterio to read raster data stored as GeoTIFF files, which uses NumPy to store arrays. This supplements our own Python Library: Raster Vision, an open source framework we support that also helps us work with geospatial imagery. The project was created to make it easy for Python developers to build computer vision models on satellite, aerial, and other large imagery sets with built-in support for chip classification, object detection, and semantic segmentation using PyTorch.

NumPy

Raster Vision uses NumPy to store and manipulate images and labels. These are eventually converted to PyTorch tensors, but by using NumPy we can decouple parts of the codebase from PyTorch, one of many deep learning libraries.

Jupyter, Matplotlib, and pandas

We often use Jupyter notebooks, Matplotlib, and pandas to conduct data preprocessing and to help analyze experimental results. Matplotlip is also helpful in training data for deep learning as we use it to visualize training “chips,’ small windows of the image, along with training labels.

Use Case: Advancing Ulcerative Colitis Monitoring with Deep Learning Models

We received an NIH Award for a research project where we heavily relied on NumFocus projects. In our research, we used deep learning to make predictions about the severity of a disease from large whole slide images of tissue samples. pandas was integral in creating data frames that tracked all the models we trained. Then, we were able to query and plot them in various ways. We used Matplotlib to create a heatmap showing how much the model is attending to different parts of an image. This allowed us to compare different model types. It also was used to visualize the output of a figure/ground segmentation process for separating tissue samples from the slide background. You can learn more about the project here.

Open source projects are integral to our work. Without them, we would not be able to do much of our work as effectively. We will always be a strong supporter of the open source community, whether that be through our own contributions, including GeoTrellis, OpenTreeMap, and more, or supporting organizations like NumFocus.