Disasters Decoded - Space Apps Challenge

Disasters Decoded| Automated Detection of Hazards

Automated Detection of Hazards

Countless phenomena such as floods, fires, and algae blooms routinely impact ecosystems, economies, and human safety. Your challenge is to use satellite data to create a machine learning model that detects a specific phenomenon and build an interface that not only displays the detected phenomenon, but also layers it alongside ancillary data to help researchers and decision-makers better understand its impacts and scope.

Heuristic Approach To Flood Prediction

Summary

In this project, we used a unique combination Artificial Neural Networks (ANN's) and Recurrent Neural Networks (RNN's) for the prediction of the precipitation pattern of the world to better predict areas with most plausibility of a flood based on Inferential statistical analysis.

How I Addressed This Challenge

1.What is it? :

We have developed a unique interconnected algorithm which takes into account the present precipitation distribution to predict future precipitations. It does this by 2 parts working together in sync, Machine Learning and Statistical inferences.

2.How is it important? :

This combination allows for more dynamic predictions as opposed to using just a single one, hence giving better predictions for rainfall patterns. This is crucial for the future of flood prevention. Floods cause lots of destruction each year and we believe that our model can help to provide warnings sooner.

3.What does it do? :

It gives a prediction of how the rainfall pattern evolves over time and then uses statistical inferences to label each coordinate with a plausibility and intensity of floods which are the metrics we decided upon using for prediction.

4.How are the inferences made? :

One of the biggest challenges we faced were of how the inferences were going to be made. We eventually boiled it down to 3 key instances:

The variance and mean of precipitation of the past week of a location
The number pairs of similar consecutive precipitation values of the past week of a location
The difference of the temporal range from present value

*how the inferences interact will be discussed later*

5.Why this? :

We specifically chose this as our model because we wanted to be unique and have an out of the box way. We believe that testing our model, which people wouldn't think at first, might be one of the better ways for prediction. The core concept of our model can also be used with other pre-existing models to make even better predictions. We knew that our model probably wasn't the best one ever out there, but we knew it might be a precursor for better models for the future of hazard prediction. By no way is this only limited to flood prediction. We can use our model on almost any kind of hazard ranging from landslides to floods and floods to forest/bush fires and this property of being fluid and dynamic while being powerful is why we decided to use this specific model.

6.Aim :

We hoped to achieve accurate prediction of floods for the betterment of the the world.

How I Developed This Project

This project for our team has been filled with ups and downs and one of the most intrinsic and convoluted project we had ever worked on. This topic peaked our interest as it combined everything we enjoyed; Machine Learning, Real Word Application and Thinking outside the box.

Working on disaster prediction opened us to the real world problem solving and critical thinking which was a new and exciting feeling.

Project Details

Our project was made in mainly 3 parts:

Getting and parsing data
Training and prediction
Visualization of prediction

Getting and Parsing Data

In this part we had to download the data files from Nasa [1] and for that we decided to create a chrome extension which download all the data in the background while we focused on the other parts like how to parse the obtained data. The data was in a netCDF4 format so it needed special treatment. We create a python file to read the precipitation values and the date, then according to the precipitation values we converted it to RGB values and overwrote those values on a world image to produce the training data. It was decided to be done like this over converting to other formats like CSV as each image takes up only 60kb and the original netCDF4 file was on average 30mb so that got us 99.8% better storage management. A part of the visualization was also completed on this step. *see our github repository NasaPrecipitationDataGet-Parse*

On the other side, we also needed to get the world precipitation in real time which the NASA Earth Data didn't provide (it was lagged 3 months). After a lot of searching we found [2] and it was a life saver as no other services provided near-real time rainfall data of the whole world.

Training and Prediction

This section was probably one of the easier sections of the project but still had its one challenges. Our first idea was to make our own ANN from scratch and to train it. But it wasnt so straight forward, our ANN had one major problem, memory usage. Our system only had 8gb ram and our unoptimized ANN was memory hungry so i needed 40.56 GB RAM to function which was a bit unrealistic so we updated the code, made it shorter and faster any we got it down to 30GB RAM but it was still too much. So in the end we decided upon using pre optimized ML libraries such as TensorFlow.py to do the dirty work for us. It still needed 5GB RAM but that was now manageable with my system. Now the easy part was done, what we needed now was to actually train it on our data. We went with a one hidden layer comprising of 10000 nodes as the architecture (see img).

We had to decode (parse) the saved encoded image data and then flatten the image from 180x360 matrix to 1x64800 pixel array these are the inputs (after normalization). The output is another 1x64800 pixel array which needed to be reconstructed to form the image which had been shifted in the temporal range, meaning it would learn to create the image of the next day using the image of the previous day. This additional step to parse all the images then iterate over all of them took over 10h and used more than 10GB RAM so we had to use our friend's system. This was the hard bit, the training failed once and we were reluctant to re run it at first, but the second run worked with a loss of 0.06! Our model was saved and is over 4GB in size!

Furthermore we still needed to transform this into an RNN to produce the next weeks predictions. Then RNN we used was the "linked one to many" as shown in the following image.

*T_y = 6 in our case*

The input would be the latest rainfall data and the 6 outputs would for the next day of the week. On these predicted images we apply a inferential statical analysis.

Inferences:

It was found that:

The variance of the precipitation values for the past days of a location is inversely proportional to the plausibility of the floods.
The mean of the precipitation values for the past days of a location is directly proportional to the intensity of the floods.
The number of pairs of similar consecutive precipitation values of the past days of a location are directly proportional to both intensity and plausibility
Intensity is directly proportional to plausibility
The difference of the temporal range from present value is directly affects plausibility (acts as a bias)

With these inferences in mind we create an algorithm in python to take in the prediction from our network and parse it to create new images where the highest flood plausibility (greater than 30%) were marked with red points of varying colors depicting the intensity (This is slightly flawed as it takes in the whole world image so it also marks area of high flood with are in the ocean).

Visualization

The last step of the project was visualization of all the data. This step was relatively straight forward. We made a express server in JS which took in the parsed images (same images which the ANN trained on) and display them at a user defined FPS. There is also a feature to see the flood intensity images created earlier. If desired the user can also view just the predicted images or set how far into the future they wish to predict. The process on how to set that up is explained on our github (*see repository section below*)

Limitations

Some of the problems we faced included:

Overfitting
OOM error
Less data

but these were solved by better code management , better hardware and reducing the size of the network.

Future Improvements

This project is dynamic and is open to change with better code or even a better model. Due to the time constraints and our unfortunate late beginning, we could not test our model to the best of our abilities but that won't stop us from continuing on this project.

Some possible improvements:

Using of CNN and comparing results with current model
Using LSTM RNN and comparing results with current model
Using data from other sources and testing on other sources
Better inferences
Using existing models and implementing them with our model

How I Used Space Agency Data in This Project

Our data source was NASA and it was just amazing to see how much data exists and how the data we used in particular was such a small part of all the datasets. The data was from earthdata.nasa.gov and it gave our project the information we needed and it really made it all come together. The data we used for training [1] was in total 250GB and consisted of 7500 netCDF4 files which was parsed into 7500 60KB images for training and visualization.

Project Code

https://github.com/NasaDisasterPrediction

Data & Resources

[1] Past Precipitation Data:https://search.earthdata.nasa.gov/search/granules/collection-details?p=C1598621096-GES_DISC&pg[0][gsk]=-start_date&m=-31.5!-222.1875!0!1!0!0%2C2&ff=Map%20Imagery&tl=1585844842!4!!&fsm0=Precipitation&fst0=Atmosphere

[2] Latest Precipitation Data: https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHHE_06/summary?keywords=%22IMERG%20Early%22