We propose a data integration and prediction system (AQUDA), that has state-of-the-art machine learning algorithms for novelty detection and a web app that includes dashboards that allow any researcher to make real time decisions, moreover the system is designed that any user regardless of his or her academic background can receive some information about the actual situation of the geographic zone with the purpose to bring awareness of climate change. We consider it a very important problem to solve, because we want to make people conscious that pollution is getting worse due to vehicles and industries that throw out harmful gases to the atmosphere, and if we don't act now, there will be an irreversible damage.
We trained an unsupervised model to learn a decision function for novelty detection. In this case, we consider novelties those data points (Humidity and goes_mesurement) that are harmful for humans. We also developed a deep neural network based on ResNet-50 to perform dust detection in the air.


Finally, our architecture can save different kind of data and preprocess them from a lot of sources. A solution that can offer real time insights, and artificial intelligence models for anomaly detection and image analysis.

Figure 1. AQUDA Architecture
We chose this challenge because we are very passionate about data science, and being able to apply all our knowledge in order to solve a real world problems that concern us all like climate change (specifically air quality) we can improve each people's lives and make the world a better place to live. This challenge was very interesting because we could explore and learn about the state-of-the-art, play around with advanced machine learning algorithms such as: Unsupervised Support Vector Machines (SVM) and deep learning. We used Python as a programming language along with some libraries like scikit-learn, Numpy, Matplotlib, dash, torchvision, PIL, Django, boostrap, pytorch, pandas, seaborn.
One interesting achievement we had was putting all of this wonderful tools together in order to make a product that is simple and anyone can use.
We had two main problems: The first was with the dataset Biomass Burning Smoke, the TIF format was not suitable for the library, so in order to save some time we chose another dataset (High Latitude Dust), that was more adequate for the project's objective.
The second problem can be seen as an opportunity of work, because the new dataset (High Latitude Dust) had very few images.
We used the datasets that NASA provided us for the challenge Automated detection of hazards:
The datasets were the main pillars of our project. With [1] we could train an unsupervised version of the SVM algorithm for novelty detection (those data points that were considered harmful for human health), and then we used [2] to train a Deep Neural Network based on ResNet-50 to perform object detection (dust detection in the air). Finally we used [1] to do a exploratory data analysis. we explored the [3] dataset, but we encountered a problem with the image TIF format.
Datasets
Phenomena Detection Challenge Resources
References
San José, R., Baklanov, A., Sokhi, R. S., Karatzas, K., & Pérez, J. L. (2006). Air quality modelling: state-of-the-art.
NASA, & NOAA. Aeorosol Optical Depth. (2018). Vía http://srt.marn.gob.sv/SHOWCast/HTML/Guides/ABIQuickGuide_BaselineAerosolOpticalDepth.pd
United States Environmental Protection Agency (EPA). Particulate Matter (PM) Basics. Vía https://www.epa.gov/pm-pollution/particulate-matter-pm-basics#PM
Abulude, F. O. (2016). Particulate Matter: An approach to air pollution.