In times of natural disasters and other emergencies, access to reliable data on critical infrastructure can mean the difference between a swift recovery and prolonged crisis. Building on the success of previous initiatives, the GeoAI4Water III project is taking a step forward by developing a comprehensive dataset of Wastewater Treatment Plants (WTPs) across Thailand. This effort follows the path laid by GeoAI4Water II, which established a model for WTP detection. One of its key achievements was the identification of WTPs that were previously undocumented in OpenStreetMap (OSM) or in one of the other databases (like Hydrowaste and ClimateTrace) we used to define our ground truth. These new findings provided insights into WTP distribution and revealed gaps in existing datasets, adding knowledge about critical infrastructure in the tested areas.
GeoAI4Water III builds on these advancements by employing Smart Filtering of Sentinel-2 imagery and pre-trained deep learning models applied to BING imagery. To further enhance accuracy, the project integrates a human-in-the-loop process to validate and refine the dataset, ensuring reliable classifications.
Methodology
-
The workflow for detecting WTPs begins with the first step, known as ‚Smart Filtering,‘ where the Near Infrared (NIR) band is utilized to identify potential water bodies. This step leverages the NIR band’s sensitivity to water features, enabling the efficient filtering of possible WTP locations from medium-resolution imagery.
-
The second step, termed ‚Extraction,‘ involves the use of very high-resolution optical images provided by BING, with resolutions of approximately 30 cm at zoom level 19 and 1.2 meters at zoom level 17. It was because the Zoom 19 imagery coverage was available only for selected regions in Thailand. Rest was only zoom 17 coverage from Bing. However, the availability of zoom level 19 imagery was limited to select regions within Thailand, with the remaining areas covered only by zoom level 17 imagery.
This approach is essential when working with large-scale areas spanning tens of millions of square kilometers, as the data volume and processing demands can increase exponentially. Efficiently scaling the workflow addresses these challenges by ensuring scalability and improving time and resource efficiency. It also incorporates more sustainable practices, minimizes false positives, and maintains detection accuracy. Furthermore, this method utilizes an algorithm that reconstructs an high resolution image of a given size around a potential water point and this resolves issues related to map tiling, allowing for seamless integration without losing detection accuracy at the edges.
The model consists of two parts:
-
Zoom 17 Part (for all of Thailand): The entire process uses the Zoom 17 model with an image size of 512, processed by the YOLOv6 deep learning object detection model to identify WTPs.
-
Zoom 19 Part (for Bangkok and Rayong only): A more detailed analysis is conducted in the selected regions, using Zoom 19 (image size 1024) alongside Zoom 17 to enhance detection accuracy.
The workflow involves detecting true positives (TPs) and false positives (FPs), followed by post-processing, which merges and refines the results into geo-referenced final detections. Different confidence and intersection-over-union (IoU) thresholds are applied to ensure optimal and precise detection and classification.
The reflectance characteristics of water bodies in the NIR channel are unique, making this spectral band highly effective for their detection. In the NIR range, water absorbs most of the incoming light, resulting in very low reflectance, which contrasts sharply with other land cover types like vegetation or urban areas that tend to reflect more light. This contrast facilitates the efficient identification and mapping of water bodies, which appear dark or nearly black in NIR imagery, while non-water surfaces are brighter. Consequently, the NIR channel is a vital component of the initial „Smart Filtering“ step, as it improves the ability to distinguish water from surrounding features, thereby narrowing down the search for potential WTP locations.
Sentinel-2 and PlanetScope are satellite missions used to monitor water bodies for potential wastewater treatment plant locations. Sentinel-2 provides free, medium-resolution imagery (10m) with a 5-day revisit cycle, covering large areas efficiently, while PlanetScope offers higher resolution (4m) with daily revisits, enabling detection of smaller features.
For Thailand, Sentinel-2 data (124 products) and PlanetScope strips (9 total for Bangkok and Rayong) were analyzed using strict cloud cover limits (≤5%), specific timeframes (11/2023–07/2024 for Sentinel-2; 01/2024–07/2024 for PlanetScope), and thresholds for surface reflectance. The data integration allows for consistent and accurate identification of water bodies.
Results for the entire Thailand dataset
Following the verification step, we implemented a human-in-the-loop process to categorize each prediction into one of four classifications:
● ‘proper WTP’
● false positive (FP)
● man-made ponds
● unsure case
Out of a total of 2,952 predictions, GeoAI4Water successfully identified 66 (without spatial clustering 139(4.7%)) ‘proper WTPs’ across Thailand. Among the remaining predictions, 7 sites (0.23%) are FPs, 2,701 10 (91.5%) were categorized as man-made ponds that used water treatment for other applications, and the other 93 sites (3.57%) resembled proper wastewater treatment plants but could serve other purposes.
We compared our results with well-known open-source datasets on wastewater treatment plants, including OpenStreetMap (OSM), Hydrowaste, and Climatetrace. During the analysis, no significant differences were observed between the Hydrowaste and ClimateTrace datasets.
The Hydrowaste dataset was re-categorized into ‚proper WTPs,‘ natural-looking ponds, and enclosed systems, reducing its entries from 44 to 26 proper WTPs. Similarly, the OSM dataset identified 39 proper WTPs from 61 features. A human-in-the-loop process refined GeoAI4Water predictions, narrowing 139 initial sites to 66 proper WTPs. A Venn diagram highlighted overlaps: Hydrowaste and OSM shared more sites, while GeoAI4Water identified more new locations, underscoring the outdated nature of open-source datasets.
Additionally, during the human-in-the-loop process several insights were discovered. The process involves reclassification of predictions into five specific categories: false positives (FP), proper WTPs, ponds, or natural-looking WTPs (including shrimp farms), as well as unsure cases.
Key Findings
-
Thailand, similar to much of the Global South, lacks complete coverage of very high-resolution imagery (zoom level 19) from providers such as BING and ESRI.
-
GeoAI4Water model identified “proper” WTPs (66 sites), exceeding over 50% the number in the OSM dataset (39 sites) and being over two times larger than those in the Hydrowaste dataset (26 sites). Overlap between GeoAI4Water and OSM is 8 sites.
-
“Unsure” cases (93 sites) include palm oil sites equipped with advanced aeration systems.
-
Shrimp farms, part of the pond classification, are a major source of methane emissions in Thailand. They utilize advanced aeration technologies similar to those employed in WWTPs.
-
The introduction of the smart filtering step significantly reduced false positive cases, bringing the total down to 19 across all of Thailand.
-
There are no major differences between the PlanetScope and Sentinel-2 NIR sensors; the spatial resolution did not make significant difference as such.
This work reveals how AI models can uncover valuable data and hidden insights and pave the way for smarter, more sustainable solutions in urban planning, aquaculture, methane emissions monitoring and broader environmental management.
To keep up with future developments and releases related to this project as well as other efforts to advance geospatial technology in the mobility, humanitarian aid, and data analytics space, follow our social media channels and stay up to date on our blog.
Related posts & publications:
Sukanya Randhawa, Guntaj Randhawa, Olena Sivak, Johannes Zech, Maria Martin, Alexander Zipf, and Yuze Li. 2023. Multiscale Multifeature Vision Learning for Scalable and Efficient Wastewater Treatment Plant Detection using Hi-Res Satellite Imagery and OSM. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Advances in Urban-AI (UrbanAI ’23). Association for Computing Machinery, New York, NY, USA, 10–21. https://doi.org/10.1145/3615900.3628772