Conceptual compliance analysis with the OSHDB, Part 1

Conceptual compliance measures to what degree contributors of volunteered geographic information (VGI) are using proposed tagging-standards. Here, we look into OpenStreetMap (OSM) as the most well-known example for VGI. In OSM the most important tagging-guideline is defined by its wiki. In addtion, OSM editors like iD or JOSM provide default options to adhere to tagging-standards. By using the OSHDB API, developed at the Heidelberg Institute for Geoinformation Technology, we can analyze how compliant the highway-tags of the OSM history data are with these tagging-standards.

We can calculate the compliance based on tags: If a tag is not compliant with the tagging-standard, the tag is counted as invalid. The amount of valid tags divided through the number of all tags produces the ratio of conceptual compliance.

Alternatively, we can calculate the compliance based on entities: The entity in this case is a way, it is to say a line, tagged as highway. It represents a street or a part of a street. We can call it highway-object. It can have several attributes, which provide additional information about the street. If just one invalid tag exists in a highway-object, the highway-object is considered invalid. The number of valid entities divided through the number of all entities gives the ratio of conceptual compliance regarding entities.

In our example analysis we compute the compliance for entities of OSM history data of Mecklenburg Western Pomerania for the time span from 2013 and 2017 for each month. The analysis is conducted using Java code from the OSHDB API.

To carry out the analysis we create a class with a main-method. In order to count the entities per regular time-intervals (snapshots) as well as the valid or invalid ones, we create an enum object to store this information. At the beginning of the main-method we have to declare the database. As it is easier and faster to compare numbers with each other than comparing strings, we use a tag translator. It transforms the strings of the keys and values into integer values, so that they can be compared with the numerical values of the OSM tags inside the OSHDB.

public class OSMEntityComplianceWithExternalSource {
  // helper to collect information of an entity
  enum EntityObject {
    INVALID_HIGHWAY_TAG_VALUE,
    INVALID_CHECK_TAG,
    INVALID_TAG_KEY,
    VALID,
    IS_PRESENT
  }
  public static void main(String[] args) throws Exception {
    OSHDBH2 oshdb = (new OSHDBH2("path/to/osm-history-extract.oshdb"));
    TagTranslator tt = new TagTranslator(oshdb.getConnection());
    // . . . read tags of the editor tag lists

The results are stored in a sorted map, where each timestamp, combined with objects of a helper enum, holds the information of how many valid or invalid entities were present in the data at that time. Before running the function to compare the data, we specify the following mandatory settings: area of interest, time of interest, filter on OSM type and OSM tag. These are necessary to limit the calculations on a specific part of the OSM history data.

    SortedMap, Integer> result = OSMEntitySnapshotView.on(oshdb)
      // bounding box for analysis area, in this case Mecklenburg Western Pomerania
      .areaOfInterest(new OSHDBBoundingBox(9.94, 15.3, 52.18, 54.86))
      // select timeperiod and step interval
      .timestamps("2013-01-01", "2017-12-01", Interval.MONTHLY)
      // filter: we focus on osm ways
      .osmType(OSMType.WAY)
      // filter: we want only ways with a highway tag
      .osmTag("highway")

Within the mapping function the data of the osm history is processed. In the first step of mapping (a map operation produces one output value for each input value) we collect the information for each entity, if it contains an invalid tag.

      // mapping function
      .map(snapshot -> {
	int key = 0;
	int value = 0;
        EnumSet complianceInformation = EnumSet.noneOf(EntityObject.class);
        OSMEntity entity = snapshot.getEntity();
        // analyze step by step all tags of an entity
        for (OSHDBTag tag : entity.getTags()) {
          key = tag.getKey();
          value = tag.getValue();
          /* comparison of osm tags to tags of source lists
          . . .
          every time we find an invalid tag, we store the information in the complianceInformation
          */
        }
        return complianceInformation ;
      })

In the second mapping step, we use a flat map – an operation that produces an arbitrary number (zero or more) of values for each input value – where we add the information for every map, if the entity is valid and we count the entities.

      // collect valid entities as well as all present entities
      .flatMap(entityInformation-> {
        // if no entry is in the result, it is a valid entity
        if (entityInformation.isEmpty())
          entityInformation.add(EntityObject.VALID);
        // needed to get total amount of tags per entity
        entityInformation.add(EntityObject.IS_PRESENT);
        return new ArrayList(entityInformation);
      })

At last we aggregate the results per timestamp as well as EntityObject-object.

      // aggregate results per timestamp
      .aggregateByTimestamp()
      // and also per EntityObject type
      .aggregateBy(entityObject-> {
        return entityObject;
      })
      .count();

Calculate the compliance per month

We divide the amount of the valid entities through the total number of entities on a monthly basis.

Figure 1: Conceptual Compliance of OSM history data of Mecklenburg Western Pomerania with the tagging-guidelines for highway-objects of the editors iD, JOSM and the OSM Wiki.

Figure 1 shows the compliance for highway-tagging between OSM data of Mecklenburg Western Pomerania 2013-2017 and the tagging-guidelines of the iD and JOSM editor as well as the OSM wiki. The compliance values are based on entities. For the time between mid of 2015 and 2016 we can see a reduction of the compliance rate for the iD and the JOSM editor. This is caused by a key (de:strassenschluessel) which was then used in Mecklenburg Western Pomerania. That key is not part of the iD and JOSM tagging-guidelines.
A further investigation conducted with the OSHDB API, reveals that the decrease is mainly caused by this special key.

This analysis provides interesting results regarding the conceptual compliance as well as for the capabilities of the OSHDB API. It showcases the power of the OSHDB API for doing conceptual quality analysis.

Part 2 of conceptual compliance analysis with the OSHB you can find here:

heigit.org/spatial-conceptual-compliance-analysis-with-the-openstreetmap-history-database-oshdb/

Related work: Ballatore, A. ; Zipf, A. (2015): A Conceptual Quality Framework for Volunteered Geographic Information. In COSIT 2015 Conference on Spatial Information Theory XII October 12-16, 2015, Lecture Notes in Computer Science. UC Santa Barbara.

Comments are closed.