New Paper: „Paved or unpaved? A Deep Learning derived Road Surface Global Dataset from Mapillary Street-View Imagery“

The paper addresses the global shortage of detailed road surface data by leveraging street-view imagery from Mapillary and advanced deep learning techniques. Traditional datasets like OpenStreetMap (OSM) often lack comprehensive road surface attributes—with only about 30–40% coverage—hindering applications such as travel time estimation, disaster response routing, urban planning, and environmental assessments. To fill this gap, the paper proposes a novel approach that utilizes heterogeneous, crowdsourced imagery to classify roads as paved or unpaved. This method integrates a hybrid deep learning framework, combining SWIN-Transformer-based road surface prediction with CLIP-and-DL segmentation-based thresholding to filter out low-quality images.

The study integrates diverse data sources and applies deep learning to predict road surface types at a global scale. The model’s performance is evaluated using standard metrics (e.g., confusion matrices, F-1 scores, and Matthews correlation coefficient), revealing robust predictions even under challenging conditions such as occlusions, varying lighting, and low-quality imagery. Despite these successes, the analysis also identifies key limitations: issues with image heterogeneity, spatial bias in Mapillary coverage (with urban areas generally better represented than rural ones), and challenges in map matching between OSM and Mapillary data.

The predicted road surface data has been aligned and merged with OpenStreetMap (OSM) road geometries. Validation against OSM surface data demonstrated high accuracy, with F1 scores for paved roads ranging from 91% to 97% across different continents. This dataset significantly expands global road surface coverage, adding nearly four million kilometers beyond existing OSM data and now representing approximately 36% of the total global road network. While most regions exhibit moderate to high paved road coverage (60-80%), notable gaps remain in parts of Africa and Asia. Urban areas generally have near-complete paved coverage, whereas rural regions show greater variability.

This enriched dataset is particularly useful for regions where road data is scarce and can support a wide array of applications, from improving disaster management strategies to informing sustainable development policies.

Dataset: HeiGIT (Heidelberg Institute for Geoinformation Technology) Humanitarian Data | 231 Datasets | HDX

Reference: https://www.sciencedirect.com/science/article/pii/S0924271625000784

Title image: Map visualizations of road sequence data from various global urban areas. Each panel displays sequences that have been color-coded where possible, although the high volume of sequences in areas such as Tokyo and Moscow prevents distinct color coding. The different colors observed in sequences from New York and Paris indicate various sequences; however, due to the limited color palette, a single color may represent multiple sequences.