Free and open-source map data has become a keystone for research across diverse fields. The extensive coverage of OpenStreetMap (OSM) data allows scientists to conduct independent studies without relying on corporate collaborations or investing heavily in proprietary datasets. However, OSM data coverage completeness varies significantly by location. Moreover, the latest data updates cannot always be guaranteed.
Before embarking on a project using OSM data, it is essential to evaluate the data’s completeness within your area of interest. Traditionally, this task involved time-intensive processes, extracting and analyzing data from multiple databases. To address this challenge, we have developed a pipeline to efficiently assess the completeness of OSM building data for any region worldwide, using the Overture Maps database.
What Are Overture Maps?
The Overture Maps Foundation is a collaborative initiative that provides open and accessible global map data. By integrating contributions from various organizations, Overture Maps offers a reliable resource for mapping data. This makes it a valuable tool for analyzing OSM data quality and completeness, among other applications.
A Simplified Pipeline for OSM Building Data Completeness
Detailed instructions for using the pipeline are provided in the accompanying Jupyter notebook.
We implemented this solution through a Python script that downloads building data for a user-defined area of interest from Overture Maps. To store and process the data locally, a database management system capable of handling Structured Query Language (SQL) is required. We recommend DuckDB for its simplicity, but you can use any system you’re comfortable with.
The pipeline offers significant flexibility. Users can define custom areas of interest and even compare multiple regions simultaneously. These areas are specified as a list of bounding boxes, which can be adjusted as needed.
For demonstration purposes, we downloaded building data for five cities worldwide and analyzed OSM data completeness for each location. Using SQL in DuckDB, we categorized the data by source, separating it into OSM data and all other sources. Additionally, we determined the total count of buildings within the study area. These metrics allowed us to calculate the percentage completeness of OSM building data for each region.
Visualizing Completeness
To enhance clarity, we visualized the results using a bar chart, which highlights the variability of OSM data completeness across regions. This clearly highlights the significant variations in OSM data completeness between different areas.
For even more detailed insights, the results can be visualized on an interactive map. We utilized Lonboard for this purpose. Its toggle feature allows users to compare OSM building data on one layer with missing data on another. This functionality enables precise observations of OSM building data completeness and reveals regional variations in detail.
Conclusion
This pipeline offers a quick and efficient solution for researchers and practitioners. It supports informed decision-making regarding the suitability of OSM for specific projects while also serving as a valuable tool for advancing ongoing research on OSM data.
To keep up with future developments and releases related to this project as well as other efforts to advance geospatial technology in the mobility, humanitarian aid, and data analytics space, follow our social media channels and stay up to date on our blog.