Vesper - Space Apps Challenge

Vesper| Automated Detection of Hazards

Automated Detection of Hazards

Countless phenomena such as floods, fires, and algae blooms routinely impact ecosystems, economies, and human safety. Your challenge is to use satellite data to create a machine learning model that detects a specific phenomenon and build an interface that not only displays the detected phenomenon, but also layers it alongside ancillary data to help researchers and decision-makers better understand its impacts and scope.

PhenomDetect: Detection of Air Quality Hazards in the U. S.

Summary

This project utilizes a machine learning algorithm to detect air quality conditions in parts of the United Stated based on satellite data. A Tableau dashboard is included to visualize different aspects of the predicted air quality condition in the United States. The visualization tool also includes ancillary data to access the impact of the detection.

How We Addressed This Challenge

We developed a Random Forest algorithm to automatically detect the air quality condition in parts of the United States, along with a Tableau dashboard to visualize the detections and access its impact.

This tool is important because it will enable the researchers and key-decision makers to easily access the scope and impact of air quality conditions of specific regions in the United States. For instance, we have made effort in our solution to show that states with moderate air quality conditions have high population density, high number of automobile registrations and a high number of individuals who are at risk of chronic lower respiratory diseases.

The solution was demonstrated using a tableau interactive dashboard. The first interface visualization demonstrates the air quality category of the states that we incorporated. Based on the recommended breakpoints for 24-hour average air quality index, the range of the model's predictions fall into three categories: good, moderate and unhealthy. Upon clicking on the states, on can view the forecast of the air quality conditions for that state, which is produced by a random forest algorithm.

Furthermore, the second interface shows the 6 states with the worst air quality conditions in 2019. Users can infer the correlation that the states that have a high AQI is also have ample number of vehicles registered and also the number of chronic obstructive pulmonary disease (COPD) is higher.

We hope that, with further development, this tool can serve to provide relevant information to different groups both in the form of an archive for past data and a tool that leverages past data to predict future trends. We hope that other phenomena will be included in the tool as time goes on.

How We Developed This Project

Our team took on this challenge because we were inspired by the idea of building a tool that could potentially save many lives just by automatically analyzing data from a variety of sources and putting this analysis into the hands of key decision-makers, as well as the general public.

Our approach in solving the problem involved investigating several machine learning models to automate the detection of hazards, building a dashboard to visualize the detections and incorporating ancillary data in an attempt to show the scope and impact of the detected hazards.

To develop the machine learning model, we utilized popular python libraries like scikit-learn and TensorFlow, which allowed us to explore the data, construct new features and evaluate several machine learning models. Throughout the hackathon we tried the following models: Linear Regression, Support Vector Regression, Random Forest Regression, Gradient Boosting Regression, XGBoost Regression, K-Nearest Neighbour, Bidirectional LSTM network, Bidirectional GRU Network, Multilayer Perceptron Network and 1-D Convolutional Network. We utilized Google Colab to train and evaluate these models. We used the R2 value as the primary evaluation metric. After many attempts at optimizing the models, our highest performing model was the Random Forest Regression model, which achieved an R2 value of 0.56.

We utilized Tableau to create a dashboard for visualizing the detections. Since our model is not very accurate, we attempted to demonstrate the impact of the solution by visualizing an aggregation of the detections. We also included ancillary data such as the population density of U. S. states (2019) and an estimate of the forest cover in each state. These ancillary data are intended as supplementary information for accessing the impacts of the detection.

The main problems we faced while working on this challenge are related to the data itself. Firstly, none of our team members had domain expertise in geological and meteorological data, so we did not have clear idea of the importance of the features of the dataset or how to combine them into more meaningful features. Secondly, the data had an inconsistent temporal resolution (data points are not consistently sampled every hour from each station), which made it especially difficult for our machine learning models. Finally, we also struggled while deciding which ancillary data to include in the visualization tool to create a more impactful solution.

Despite the challenges, our team put together a tool that we think will become useful as more development time goes into it.

How We Used Space Agency Data in This Project

In analyzing the problem, we have used the space agency’s air quality data. This data was used, in addition to other open data (see below) to train and evaluate several machine learning algorithms. Parts of the data were also used in the visualization tool.

Project Demo

Project Code

https://github.com/ShahedSabab/air-hazard-detection

Data & Resources

Judging

This project was submitted for consideration during the Space Apps Judging process.