OEV: Analysing attributes of remarkable elements (2/4)

After introducing the OSM Element Vectorisation Tool earlier this week, we now want to show possible use cases and specific examples of what the tool can do. This first of three use cases takes a closer look at the data in the region of Heidelberg, Germany. We will use the concept of archetypes to identify a few contrastive elements. Please visit the repository for detailed information.

Archetypes

Archetypes in this context can be thought of as a group of elements that frame the population. One can think of them as the points that create a (multidimensional) convex hull around the data, but with a flexible number of edges. I.e. they mark the (multidimensional) extremes of a distribution. Therefore any element within the population can be described as a combination of the archetypes:

RGB colors

Figure 1: One of the simplest example of archetypes is the RGB colour spectrum with red, green and blue being the archetypes. All other colours are a certain combination of these archetypes. Image Source: Wikimedia Commons.

Archetypical objects, which are located at the edges of a distribution, are somehow the opposite of “typical” objects within the distribution. Therefore this analysis is similar to the paper by Peter Mooney and Padraig Corcoran (2012) where they look into the special case of the Characteristics of Heavily Edited Objects in OpenStreetMap.

Data

The data for this example covers the region of Heidelberg extracted on the first of January 2022 containing 8,838 elements. To reproduce this code, first download the example heidelberg from the API and convert it to a GeoPackage (see the README). For an overview of the exact implementation for the different data aspects, please refer to the documentation.

Figure 2: Overview of the data coloured by CORINE class. Blank spaces on the map either represent unmapped areas or the elements in these regions extend beyond the used bounding box and were therefore removed by the tool. The data represents a snapshot taken in 2022, which may become relevant later. Base layer: OSM Carto.

 

Preparation

Using the archetypes library we can get an estimation for a good choice for the number of archetypes:

Figure 3: Screeplot of the residual sum of squares in relation to the number of archetypes chosen. The dashed line indicates the number of archetypes picked for the further analysis

As always with real data, the result of these helper functions is subject to interpretation and not as clear as most examples show. We will use the first noticeable knee of five archetypes (indicated by dashed line in the screeplot). For this high dimensionality, a larger number would of course be more fitting. Yet, already these five archetypes separate the data well into similar sized regions.

Archetype

Nearest Neighbour Count

1

2120

2

1856

3

1627

4

1406

5

1829

The method though not forcefully uses real data points to represent the archetypes. We will use the closest real data points (nearest neighbours) as an approximation for the given analyses.

To simplify the following plots we will crop the name of the elements to the significant part which in this case are the first two digits of the ID. Because IDs are given sequentially (by type), the order also orders the data by object age:

Figure 4: Detailed view on the five archetype objects.