A One Health Approach

Air pollution is a major global environmental health risk, causing an estimated seven million deaths across the globe annually. Your challenge is to take an interdisciplinary approach, using both Earth science and health science, and integrate different types of datasets and applications to study the effects of air pollution.

A one health approach

Summary

We are using K nearest neighbour machine learning algorithm to solve this challenge. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique.A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. We have also done different data visualization with the help of different R package for better understanding of our solution.

How We Addressed This Challenge

Our model is working with a accuracy of 95%.Some screenshot I want to attach as a proof.

Below is the screenshot of our dataset

How We Developed This Project

We are using K nearest neighbour machine learning algorithm to solve this challenge. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique.A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor. It should also be noted that all three distance measures are only valid for continuous variables. In the instance of categorical variables the Hamming distance must be used. It also brings up the issue of standardization of the numerical variables between 0 and 1 when there is a mixture of numerical and categorical variables in the dataset. Choosing the optimal value for K is best done by first inspecting the data. In general, a large K value is more precise as it reduces the overall noise but there is no guarantee. Cross-validation is another way to retrospectively determine a good K value by using an independent dataset to validate the K value. Historically, the optimal K for most datasets has been between 3-10. That produces much better results than 1NN.Tools that we have used are Google colab,Rstudio.

How We Used Space Agency Data in This Project

I have used dataset related to covid-19.In that there are many columns which decides the contents of air.

Project Demo

https://drive.google.com/file/d/1RYwCwjPmntJXDhtBc0lbESHXu-ZrNSOc/view?usp=sharing



https://github.com/RaviPrakash1264/NSAC-Environmental-club

Data & Resources

Covid 19 dataset,

https://archive.ics.uci.edu/ml/index.php

https://www.kaggle.com/rohanrao/air-quality-data-in-india?select=station_day.csv

Tags
#Machine Learning,#Data Visualization
Judging
This project was submitted for consideration during the Space Apps Judging process.