In the GIScience research group at Heidelberg University, a recent PhD research project by Clemens Jacobs has been looking into the data quality of citizen science observations of organisms. This research aims at using geographic context as an information source for estimating the plausibility of an observation, e.g., of a bird, which was reported to a citizen science portal collecting such data from volunteers. To this end, approaches were developed which tap into different kinds of sources of Volunteered Geographic Information (VGI) to capture geographic context. One of these sources is OpenStreetMap (OSM). HeiGIT’s OpenStreetMap History Database (OSHDB) as part of the ohsome OpenStreetMap History Analytics platform was used to derive typical contexts of species in the form of OSM tags which are frequently found close to observations of certain species. This information can then be used to examine the tag context of a new observation of such a species. If the context of an observation fits the typical OSM context well, this indicates a high plausibility of that observation.
Let’s make an example. Rose-ringed Parakeets (Psittacula krameri) are a species of parrots. In Germany, they have been spreading since the late 1960s. They live mostly in urban areas, where they use tree holes, but also holes in buildings, for breeding. They reached the region around Heidelberg and Mannheim in the early 1970s. The map below shows plausibility estimations based on OSM tag context for 111 observations of Rose-ringed Parakeets in or around Heidelberg and Mannheim. These observations were reported to iNaturalist, an American project which collects observations of organisms from volunteers all over the world. The Rose-ringed Parakeet’s typical OSM context, on which this estimation is based, was extracted with the help of observations from ArtenFinder, a citizen science project with similar aim and properties as iNaturalist which is active in the federal state of Rheinland-Pfalz (covering parts of the area shown in the map below, west of the river Rhine). Plausibility estimation in the light of OSM context shows relatively high or medium values for observations in or close to urban centers, while some observations in more rural settings in the western part of the focus area of this example (see map below) present cases of lower plausibility. In this example, plausibility was estimated by comparing the observations’ OSM context and the Rose-ringed Parakeet’s typical OSM context with the help of the Jaccard index, a well-known coefficient for measuring the similarity of the biodiversity at different sites. Observation locations which are away from urban centers appear less plausible, because OSM context differs from the typical situations in which Rose-ringed Parakeets are usually observed. Therefore, this approach can be used to automatically identify unusual observations which might need closer examination in quality assurance of such observation data. It also proves high plausibility of observations in typical situations, which need not be further scrutinized.
This research, presented here in much simplified form, contributes to the development of approaches which allow for automatic quality assessment of citizen science observations of organisms. Such approaches can support quality assurance by identifying unusual observations, and by proving high quality of others. This may reduce the workload for experts in charge of quality control, and may render data quality more transparent for any user of such data. HeiGIT’s OSHDB and the tools surrounding it, especially the OHSOME API, provide the necessary infrastructure for the implementation of this approach in a proper methodology.