Generating an accurate, open source map of all the buildings in the world and keeping it up-to-date is a grand challenge within the field of GIS. Today, maps are typically developed by manually tracing polygons using overhead imagery taken from airplanes and satellites. Because this process is so labor intensive, it is difficult to maintain a high level of coverage. Deep learning can be used to partially automate this process. For example, in the RapiD project, researchers from Meta and Microsoft trained models that predict building and road geometries which are then edited and verified by OSM contributors. However, more research is needed to generate building footprints that are on par with those drawn by humans.
In this three-part blog series, we summarize some of the latest research on automated building footprint extraction. We discuss open source datasets in the first part (ie. this blog), evaluation metrics in the second, and model architectures in the third.
In the past several years, various open datasets containing building footprints have been released. These datasets can be used to train models, and have accelerated research in this area by providing a standardized benchmark.
To support polygon footprint extraction, labels should be in vector format. In contrast, some datasets, such as Inria Aerial Image Labeling, have raster-based labels where each pixel is labeled as building or background. There are two main ways of obtaining labels for training models: manually drawing them using a labeling tool such as GroundWork, or extracting them from OSM. Obtaining labels from OSM can save effort, but is only possible for the small percentage of areas that have good coverage. In addition, sometimes labels in OSM are misaligned with the imagery, or are missing or inaccurate, which requires manual correction. In addition to OSM, there are several open datasets of building footprints that have been generated using a deep learning model. For example, Google Open Buildings contains ~500M building footprints in Africa, and Microsoft Building Footprints contains ~700M footprints around the world. These are less applicable for training models since they presumably contain errors.
In order to see buildings in enough detail to extract accurate footprints, the resolution of the imagery must be relatively high: typically <= 50 cm per pixel. This includes drone and aerial imagery, and very high resolution satellite imagery such as that captured by the Maxar WorldView 3 satellite. Open source imagery with sufficient resolution is typically only available for a small number of locations around the world. This is problematic because it is important to train on a geodiverse dataset that contains a variety of rural and urban locations with different cultures and building styles. On the other hand, open satellite imagery with global coverage, such as Sentinel-2 which has 10m resolution for the RGB bands, does not have sufficient resolution.
In this section we summarize several open datasets which combine high resolution imagery and validated vectorized labels. These datasets are most convenient for training and evaluating models for building footprint extraction.
The Replicable AI for Microplanning (Ramp) dataset is the most recently published open dataset (as of September 2022) containing imagery and polygon annotations of buildings. It is geographically diverse, covering urban and rural areas in a variety of low and middle income places including Dhaka, Shanghai, Paris, Accra, Kinshasa, Kampala, Oman, India, and the Philippines. The dataset was built on several existing open datasets with the addition of new labels for some of the regions. The labels were corrected to be consistent, and are now all aligned with rooftops. All of the imagery has < 60cm resolution, is < 15 degrees off nadir, and is nearly cloud-free. The dataset is distributed as ~50,000 256×256 chips. The dataset is accessible in STAC format via the MLHub, and more information about how the dataset was constructed can be found in this blog post. Since Ramp was built from other datasets, it inherits a combination of different licenses for different parts. The most restrictive license used is CC BY-NC 4.0 (derived from the use of Maxar’s Open Data Program) which does not allow commercial use.
SpaceNet was a series of geospatial machine learning challenges with accompanying open datasets made available under a Creative Commons ShareAlike 4.0 license. SpaceNet 1 and 2 together have over 685,000 polygon building footprints for Rio de Janeiro, Las Vegas, Paris, Shanghai, and Khartoum along with 50cm multiband satellite imagery from Maxar. These are distributed as 600×600 chips. SpaceNet 4 focused on mapping buildings using off-nadir imagery over Atlanta. SpaceNet 6 was concerned with mapping buildings over Rotterdam using a combination of optical and SAR imagery which has the ability to see through clouds using radar. Finally, SpaceNet 7 was about mapping changes in building footprints over time. This dataset covers ~100 AOIs and has an image for each month over a two year period. The imagery in this dataset is from Planet and has a resolution of 4m which is too low for extracting accurate detailed building footprints.
The OpenCities AI Challenge was another geospatial machine learning contest centered on building footprint extraction. The associated dataset released under a Creative Commons BY 4.0 license contains drone imagery from 10+ cities and regions in Africa, and polygon labels extracted from OSM (except Zanzibar). The resolution varies between regions from 2-20 cm, and covers 790k building footprints over 400 km^2.
The CrowdAI Mapping Challenge dataset has 280k 300×300 RGB images for training, and 60k images for validation along with polygonal building annotations. This dataset was used in a contest hosted by the AI contest platform AIcrowd (formerly known as CrowdAI). A more detailed description of the dataset does not seem to be available.