Kalye - Space Apps Challenge

Kalye| Automated Detection of Hazards

Awards & Nominations

Kalye has received the following awards and nominations. Way to go!

Global Nominee

Automated Detection of Hazards

Countless phenomena such as floods, fires, and algae blooms routinely impact ecosystems, economies, and human safety. Your challenge is to use satellite data to create a machine learning model that detects a specific phenomenon and build an interface that not only displays the detected phenomenon, but also layers it alongside ancillary data to help researchers and decision-makers better understand its impacts and scope.

Automated Road Hazard Detection with Overlaid Open-Source Datasets and Satellite Images

Summary

The main objective is to devise an approach to automatically assess road safety levels of a city’s road networks. METRO MANILA was the case study. The approachDoes not rely heavily on existing records of accidentsReduces/eliminates manual/local surveillance for measuring model variablesHas outputs delivered straight to users to directly inform them of Kalye’s insightsThis was achieved with open-source data from OpenStreetMap and satellite imagery from MODIS, Worldview-2, and NASA’s ASTER. Kalye’s methodology is a holistic approach to evaluate road safety levels of driveable networks in cities. It can be easily adopted in different sections of a road network despite big variances in thei

How We Addressed This Challenge

SUMMARY INFOGRAPHIC:

Please view this graphic if you want an interactive material to understand what is Kalye.

https://my.visme.co/projects/w4jnq1ok-project-kalye

WHAT IS THE PROBLEM?

There are two sides to the story. Let us tell you a set of stories from the two perspectives to put things into context.

The Road User’s Story

Navigation apps are helpful in giving Helen the directions to places she’s never been to before. She’s living in a mega-city in the Philippines, the metropolis of Metro Manila. The city is incredibly busy and the roads are networked like cobwebs, with unprecedented intersections, rough and slippery roads, hard-to-manage inclines, often hard-to-navigate turns and exits, etc. The country is also often flooded by monsoon. The roads in the nearby cities such as Tagaytay and a little bit further away, Baguio City are also challenged by fogs causing poor visibility in the roads.

This interweaving difficulty in navigating unfamiliar roads and largely un-evaluated road conditions adds stress and Helen has often encountered accidents caused by these factors, and in several occasions, have been involved in one herself. In these situations, she’s always been wishing for the navigation apps to not just direct her but also to give her information on Road Safety Levels of the road sections she’s traversing.

Some apps do warn them of existing traffics on the road, or blockades, or accidents already reported by previous users. But, it does not warn her during normal days whether the road ahead will be dangerous as it is slippery, or there’s a big bend, or that there’s high pedestrian density, etc. If these insights were only accessible to her, not only will it ease her navigation, but it will definitely guide her driver behaviour such as reducing speed when necessary, or being more vigilant when in a high risk road section.

If only the navigation app is also able to give her insights not just of reported/current accidents ahead, but also guide her to be more careful by telling her the risk level of the road sections are.

The Road Safety Evaluator’s Story

Arthur is working as part of an urban development team, the city council, and works as a researcher in measuring safety levels of different road sections in the metropolis. He happens to be a colleague of Helen and have been exchanging stories with her on how complex the roads are in the city, and how it would be great if everyday road users can also know where are the risky roads, or the sections very prone to road hazards.

This prompted Arthur to initiate a localized research of the city’s road networks to create a holistic database of road safety levels of the city’s roads! He wanted to check if he can create a machine learning problem approach to modelling or evaluating the different hazard risk levels in Metro Manila’s roads using historical records of accidents from the past years. It would’ve been a straightforward classification approach where features are used to determine whether a road section is accident prone or not. However, even from the start of the research, he was met by several challenges:

There’s no comprehensive database of past accidents. This is because the city has several reporting agencies recording accidents in their own locality. However, due to the absence of a centralized record host, these records are stored in their local database in very different formats, making it difficult to consolidate.
Upon reviewing related literature, interestingly, there are tons of similar researches intent on grading or quantifying the hazard risks of road sections. However, these methodologies require too much tailoring making it difficult to apply in other parts of the network.
Furthermore, the manner of measuring the features for the model requires a lot of manual and localized surveillance. Examples of these efforts would be deploying a team to physically survey all road sections in terms of surface type, conditions, elevations, characterizations of infrastructures, and traffic-related statistics. For Arthur to effectively evaluate the city’s network, he needs a large team and a wide timeframe!
Some models dealt away with the features that needed manual labour by idealizing them which made most models methodologically simple but too idealized for real-life contexts such as real-time and real-spatial evaluation for road users anywhere, anytime.

APPROACHING THE LACK OF COMPREHENSIVE DATABASE, DIFFICULTY OF MANUAL SURVEILLANCE, AND ACCESSIBILITY OF HAZARD INSIGHTS TO USERS

Kalye had to think creatively. The problem can’t be blindly approached by just machine learning techniques. In fact, the methodology is devised by taking into account the above challenges of both stakeholders. Kalye’s methodology

Is able to evaluate/score the road-related hazards through Synthetic Risk Scores without need of modelling out of historical records of traffic accidents
Is simple and automated enough to enable acceptable level of granularity (i.e. ability to evaluate road sections instead of an “average” rating for a big road network) to make it useful for varying locations of users
Deals away with manual surveillance by leveraging satellite data from MODIS, ASTER, and Worldview-2 to generate features on road infrastructure and characterization, fog detection/visibility, and traffic statistics, along with open-source data from OpenStreetMap
Created a platform (Kalye App and Kalye Cloud) that addresses the challenge of lack of centralized road accident records moving forward to enable automated centralized data collection for re-visitation of this problem with alternative methodologies (mainly ML-approaches)

How We Developed This Project

Recall that the first layer of Kalye’s problem is how to generate a Synthetic Risk Score model without relying on historical data of road accidents (as there’s a lack or minimal availability of it). Note that if this is approached as a supervised classification problem, you can simply extract features such as road features where the accidents happened, user-related variables of the accident, etc. and run a model to determine how certain variables impact road hazards.

Since it can’t be approached this way, Kalye’s methodology involves creating a Synthetic Risk Score used to “grade/evaluate” the level of risk of road-related hazards of each road section in a network. Described as follows is the framework for calculating the risk score.

For the full methodology, please refer to https://github.com/HQuizzagan/kalye/blob/master/Kalye's%20Road%20Safety%20Scoring%20Framework.pdf

Take note that the Synthetic Risk Score is not the information thrown to the user. Instead, it's encoded into interpretable risk classifications. The raw scores won’t be easily interpreted by the user. For example, if Helen is driving along Road Section A and she sees the hazard risk level is 0.987, she wouldn’t easily know what means. As such, the raw scores are encoded as

Low Risk
Medium Risk
High Risk

Note that Kalye’s labelling adopted a tripartite system for ease of interpretation, making it user-friendly. For example, had we used the five-level categorization of very low risk, low risk, medium risk, high risk, and very high risk, it would be very hard for users to differentiate how to behave as driver in a High Risk vs. Very High Risk environment.

Justification of the Weighted Combination Modelling Approach

In using impact factors such as the vehicle-user factors, road-related factors, environment-factors, and time-varying factors, most models would simply take the product of each to generate the risk score. The rationale is because each impact factor simply scales the overall impact into the final risk.

However, Kalye opted for a weighted combination modelling approach and here are the advantages:

The weights wj, j=1,2,3,4,5 for each factor allows determination of feature important. The feature importance justifies the validity of the model (i.e. why is the risk score that high/low?). In other words, it aids in the interpretability of the model.
The feature importance basically helps identify which of the four impact factors is more impactful to road hazards. This leads to design applications. For example:
If road-related factors are deemed as most impactful factor, then during urban developments, you can use the model to optimize between features of the road to minimize road hazards!
If vehicle-user factors are seen to be very impactful factors, it will aid policy makers to craft public policies that helps control vehicle-user behaviour to avoid hazards! Examples would be implementation of speed limits, monitoring of driver behaviour, etc.
Lastly, the model Kalye devised accounts for the interdependence of the impact factors which is unavoidable and relevant to account for a better picture of risk assessments!

Kalye’s Innovative Feature: Automated Generation of Road Accident Records

To maximize the user base of Kalye, it features an easy-to-access “Report an Accident” floating button. This feature would allow crowd-sourcing of data to centralize records of road accidents which can then later be used for further studies. Why is there no well-maintained database of road accidents? For context, we say well-maintained if it contains features regarding the incident that describes it well.

Each reporting agency (police stations, local road authorities, smaller road safety management offices, etc.) maintain their own methodology of reporting. Hence, varying across the city.
It is hard to manually record incident data such as type of vehicle involved, features of road where accident happened, degree of damage caused by accident to humans and property, environmental conditions around accident, etc. Essentially, these are meaningful factors to understand what causes accidents, but they are not logged because they’re hard to manually record.

So how does Kalye’s Report an Accident help?

The reporting can now be done by anyone who uses the Kalye app. Road users who are then encountering road accidents can simply follow these very efficient steps:

Grab their phone where Kalye app is already opened for navigation.
Click on the Report an Accident button to activate the phone's camera.
Take one or multiple snapshots of the incident on the road they have encountered.
Upload these images into the Kalye cloud.
Became a road accident incident reporter.

How essential is this crowd-sourced and Kalye-managed database of road accidents?

As mentioned above, it is very hard to manually extract important information on road accidents for records purposes. With Kalye’s backend, the report is automatically generated by

Leveraging the fact that there’s a good chance a single accident on the road can be photographed by different users from multiple angles.
Use image processing instead of manual labour to extract the key information describing an incident such as type of vehicles involved, road features where incident happened, degree of damage done, etc.
No human will then have to manually encode these data as they are simply generated from the multiple images. Hence, an automated recording of traffic accidents, by people on the field (i.e. everyday road users).

How We Used Space Agency Data in This Project

The main role of satellite imagery is to measure the model features (i.e. variables under each of the four impact factors) without need of manual surveillance. This makes the methodology easy to adopt since it is an Automated Hazard Detection Framework. Presented below are very brief overviews of how satellite images were used by Kalye in assessing the road hazards!

Worldview-2 Satellite Imagery and GIS Roadway Maps

Curve identification and information extraction can be made possible by employing curve extraction methodologies on satellite images and GIS shapefiles of road networks. We determined three features that will help contribute to the project goals: curve type, curve length, and curve degree (basically, “sharpness). Using either satellite images or GIS shapefiles present different advantages and disadvantages, which we discuss below.

Easa, S. M., Dong, H., & Li, J. (2007) presented a method for establishing road horizontal curves from satellite imagery. Canny edge detector method was implemented, which involved the conversion of the colored image to a gray image and creation of an edge image by locating abrupt changes in the intensity function. The Hough transform, a popular algorithm for detecting features from raster images, was used to detect the tangents straight lines and the corresponding horizontal curves. The authors were able to accurately establish simple and reverse curves for a complex freeway interchange (Fig. 1) as well as extract the parameters of the curves, including radii, start point, and end point. However, the paper only focused on establishing horizontal curves for only one side of the road: inside or outside edge. While it is possible to apply the proposed method twice, there is no guarantee that the two sides of the curve will share a similar center, as they should.

Fig. 1. Results of establishing (a) a simple horizontal curve and (b) a reverse horizontal curve at a freeway interchange.

An ArcGIS add-in tool, CurveFinder, developed by Li et. al. (2012 & 2015) uses GIS roadway maps to extract horizontal curve data. The fully automatic tool makes use of a curve data-extraction algorithm that: (a) detects all curves from each road in a selected roadway layer, regardless of the type of curve; (b) classifies each curve into one of two categories: simple or compound; (c) computes the radius and degree of curvature for each simple curve, as well as the curve length for simple curves and compound curves; and (d) creates curve features and layers for all identified curves in the GIS. They were able to fully demonstrate horizontal curve data extraction and curve type identification. However, low-quality GIS roadway segments are a major cause of error, for two scenarios (Fig. 2): (a) deviation from actual roadway alignment; and (b) the low-vertex resolution of the GIS roadway centerline to describe the actual alignment of the roadway.

Fig. 2. Typical scenarios of low-quality GIS source data: (a) Scenario 1 and (b) Scenario 2.

We propose a methodology to combine both of these approaches to fully maximize the data. We make use of satellite imagery data to create an edge map of the road network. The centerline of the edges are then computed, creating a roadway polyline. This polyline network is converted to a shapefile that can be fed into CurveFinder, and we subsequently extract the relevant information.

Using ASTER data for elevation data

Base reference: Extracting Topographic Data from Online Sources to Generate a Digital Elevation Model for Highway Preliminary Geometric Design

Two ways to extract ASTER data from Google Earth:

Traditional

Google Earth does not provide an API interface for the users to connect to the data.
Instead, Google Earth connects users with Google API.
Not practical to use due to the time processing constraints from extracting data.

Standard

1:50000 is the least scale of DEMS, which is commonly used for preliminary geometric design. Less than 5 m is the absolute error of such DEMs, which is the minimum requirement to meet the preliminary design. The geographic data extracted from Google Earth Pro (the absolute error is less than 1m) which is sufficient to generate DEM.
The absolute error of data consists of two parts: Intrinsic error - Cannot be controlled by the user, a result from the manner of data collection., Manual error - Can be controlled by the user, gets generated mostly during Google Earth Pro data extraction and associated with the data points’ interval.

Data Extraction by Image Recognition

By navigating the mouse pointer at a certain target location on the Google Earth Pro’s interface, the status bar would display longitude, latitude, elevation, and eye altitude.It is possible to generate a data matrix by moving the mouse pointer, which a DEM can be extracted.
The points profiles are extracted as the mouse pointer is moved automatically, a Python program was written to ensure such operation. Excel file is automatically generated to save the geographic data, and the interval of each increment of mouse pointer movement is mixed.

Data Storage and Management

The position and elevation data are extracted and saved, once the pointer position on the screen is translated into actual geographic coordinates. AutoCAD is used to simplify the task of extracting data, the data should be translated into radians - to reduce or eliminate the possibility of errors and increase robustness. An adequate number of decimal places is retained, which depends on the position of a certain point. Consequently, the error generated by omitting decimals within 5 m, is minimized.

Validation of Extraction

Accuracy of the 3D data extracted has to be verified, prior to generating the DEM. One thousand points are selected randomly from the data set of extracted data. Control elevation data are established using field surveys, and the elevation difference between the control point and the extracted point is calculated, such process is based on the positional data (i.e., longitude and latitude of every point).

DEM Generation

Pretreatment is necessary to be done, in order to import the extracted data into AutoCAD (software that is used as a platform to generate DEM). Subsequently, in accordance with Lisp (programming language used to generate commands to extract bulk data) standards, the file is transformed into a text file as a prototype of commands. Finally, once the geographic data are imported into AutoCAD, DEM is generated.

NASA’s MODIS Daytime Data for Ground Fog Detection in Roads

Why use MODIS?

The MODIS instrument is aboard the Terra and Aqua spacecraft. The geographical coverage of the instrument allows views of the entire surface of the Earth and has a passby frequency of every one to two days (NASA). In the context of application, it can allow calculation of average visibility in major road networks every day.

How to measure ground fog levels using MODIS data?

Kalye adopted the algorithm implemented by Bendix, et al. in their research entitled “Ground Fog Detection from Space Based on MODIS Daytime Data—A Feasibility Study”. This allows an efficient and proven accuracy in terms of NOWCASTING ground visibility due to fog formation.

The strength of the methodology is that even if it is using daytime data, it can exclude mid- and high-level clouds from the detection. The effect is that it won’t just calculate the probability of poor ground visibility. Instead, it demonstrated promising nowcasting of true visibility at ground level. Kalye will extend this to grade road visibility during heavy outpour of rain.

Project Demo

The video demonstration can be viewed here

https://drive.google.com/drive/folders/14dOiNukjSIRy3zwHop9yHv24im5pCSzT?usp=sharing

If you want to download the play with the prototype app, please access the APK here:

https://drive.google.com/drive/folders/1RC2UIzXbTCYX4SJalN00g4XESIiWeEai?fbclid=IwAR2SVIVW9TgKgnp07CwXJCHvN5mn__dXAOxHGOdoeg7U36U1BRJCCQqnn04

Experience the Kalye app’s UI here: http://bit.ly/KalyePrototype

Data & Resources

Basis of Methodologies

Bendix, J., Thies, B., Cermak, J., & Nauß, T. (2005). Ground Fog Detection from Space Based on MODIS Daytime Data—A Feasibility Study. Weather and Forecasting, 20(6), 989-1005. doi:10.1175/waf886.1
Chen, S., Tang, Z., Zhou, H., & Cheng, J. (2019). Extracting Topographic Data from Online Sources to Generate a Digital Elevation Model for Highway Preliminary Geometric Design. Journal of Transportation Engineering, Part A: Systems, 145(4), 04019003. doi:10.1061/jtepbs.0000212
Easa, S. M., Dong, H., & Li, J. (2007). Use of Satellite Imagery for Establishing Road Horizontal Alignments. Journal of Surveying Engineering, 133(1), 29–35. doi:10.1061/(asce)0733-9453(2007)133:1(29)
Jurewicz, C., & Excel, R. (2016). Application of a Crash-predictive Risk Assessment Model to Prioritise Road Safety Investment in Australia. Transportation Research Procedia, 14, 2101-2110. doi:10.1016/j.trpro.2016.05.225
Li, Z., Chitturi, M. V., Bill, A. R., & Noyce, D. A. (2012). Automated Identification and Extraction of Horizontal Curve Information from Geographic Information System Roadway Maps. Transportation Research Record: Journal of the Transportation Research Board, 2291(1), 80–92. doi:10.3141/2291-10
Li, Z., Chitturi, M. V., Bill, A. R., Zheng, D., & Noyce, D. A. (2015). Automated Extraction of Horizontal Curve Information for Low-Volume Roads. Transportation Research Record: Journal of the Transportation Research Board, 2472(1), 172–184. doi:10.3141/2472-20
Torre, F. L., Domenichini, L., Meocci, M., Graham, D., Karathodorou, N., Richter, T., . . . Laiou, A. (2016). Development of a Transnational Accident Prediction Model. Transportation Research Procedia, 14, 1772-1781. doi:10.1016/j.trpro.2016.05.143
Tripodi, A., Mazzia, E., Reina, F., Borroni, S., Fagnano, M., & Tiberi, P. (2020). A simplified methodology for road safety risk assessment based on automated video image analysis. Transportation Research Procedia, 45, 275-284. doi:10.1016/j.trpro.2020.03.017

Datasets Considered

OpenStreetMap Data for Metro Manila Case Study - Sample Python Notebook for Extraction -
https://github.com/HQuizzagan/kalye/blob/master/Metro%20Manila%20-%20Road%20Network%20Risk%20Assessment.ipynb
NASA’s MODIS Data for January to December 2019 for the Philippines -
https://ladsweb.modaps.eosdis.nasa.gov/search/order/4/MODATML2--61/2020-09-20..2020-10-04/DB/117.1,19.2,127.8,4.7
GIS Data on Metro Manila Road Networks - https://github.com/HQuizzagan/kalye/tree/master/datasets
NASA’s ASTER Dataset for Measuring Elevation of Road Segments - https://search.earthdata.nasa.gov/search/ , https://search.earthdata.nasa.gov/search/
Satellite Image from Worldview-2 for Road Geometry Detection - https://mb.com.ph/2020/04/14/satellite-images-of-metro-manila-during-ecq-available/

Judging

This project was submitted for consideration during the Space Apps Judging process.