Pablo and the Aikmen has received the following awards and nominations. Way to go!
Local Peoples' Choice Winner
Home Planet at Your Fingertips
Develop a user-friendly application or tool to discover, visualize, and analyze NASA Earth data for monitoring our home planet.
Terra Tinder: Matching you with the most suitable data set
Summary
Terra Tinder aims to match users with datasets that are suited to their preferences. The user will use our web app to find the best dataset in an efficient and accessible format. We used the NASA Earth dataset and CMR API to extract databases for use in document similarity analysis. We were inspired by the popular dating app, Tinder, with the quick and efficient matching service it provides. We believe that this would be a unique way to match users with NASA Earth datasets, and in the process allow them to see many of them in quick succession. We believe that Terra Tinder meets the key requirements of the task, as it allows users to both discover and visualise datasets.
How We Addressed This Challenge
What did you develop?
We developed Terra Tinder, a Web App that matches users to suitable datasets. Our app will learn the users preferences through their likes and dislikes, meaning that the user will be shown datasets that fit their preferences.
How does it work?
The backend is responsible for computing recommendations based on positive and negative feedback given by the user on previous datasets (i.e. swiping right or left). First, metadata on all ~30,000 NASA Earth Science Datasets was gathered via the associated API and saved in data/data.csv. Each dataset was mapped to a point in a 15-dimensional embedding space by using the word2vec package in R. This data was saved in a CSV file in data/embeddings.csv The backend server runs on Python. Upon startup, an adjacency matrix is calculated based on the cosine distance between all dataset embeddings. This adjacency matrix contains roughly 900 million entries and would be too large to commit to version control. Instead it is calculated once upon startup of the server and stored in memory. The server provides an API for the frontend for recommending datasets. Dataset recommendation is achieved using an algorithm that takes into account the previously liked datasets (which are supplied as part of the API request), and associated similarity scores to other datasets. Furthermore, the backend extracts keywords from the recommended dataset's title using NLP techniques in order to conduct a Bing image search to obtain a relevant image to display in the web app.
The frontend is written in JavaScript using React. It establishes a connection to the backend server and uses the backend API to deliver information.
Why is it important?
Our application is important because it displays NASA Earth datasets in an accessible way so that all users (regardless of their scientific background) can access relevant data. This means that more users can be reached.
What do you hope to achieve?
We hope that our application is helpful for users who are unsure of which dataset is for them. The NASA Earth datasets can be matched appropriately to users using our algorithms.
We also hope that users can learn more about datasets as they swipe through different datasets.
Our project addresses this challenge in the following ways:
Our application allows users todiscover new NASA Earth datasets, as the format of our app directs users towards datasets that are suitable for their needs. When a user "likes" a dataset, our Web App will then recommend similar datasets using similarity scores generated through word embedding. We are able to compare different datasets and assess their similarities. As users swipe, they can discover more about datasets with the 'See More' feature. They can also discover new datasets quickly and efficiently with the "like" button.
Our application allows users tovisualize NASA Earth datasets in an innovative way. Our application uses dataset titles to fetch images from Bing Images. When users are "liking" or "disliking" datasets, the image search feature will allow them to visualize the NASA Earth datasets. We believe this is an accessible feature that will help users from a range of scientific backgrounds to understand and visualize NASA Earth datasets, as the images are clear and coherent with the dataset titles.
Our application helps users find data relevant to their needs. We believe our application does this well, as we utilize natural language processing and word embedding to find new datasets based on those viewed by the user. This means that datasets shown are relevant to users needs.
Our application allows users to analyse the data, but in a quick and efficient way. The format allows for quick analysis, as the 'See More' application allows for in depth understanding of each dataset, but the "like/dislike" format means that only datasets that are of interest of the user are looked into.
Our application is user-friendly, as it is simple, accessible and and the format is efficient.
How We Developed This Project
What inspired your team to choose this challenge?
Our team was inspired to choose this challenge because our team has a fair amount of interest and expertise in this area. We felt that we could develop skills we already possess and learn a lot from the NASA Earth dataset.
We felt that in order for NASA Earth datasets to reach more users, we needed to create an application that was accessible for people with a range of scientific backgrounds. Thats why we developed Terra Tinder, as we felt it was a familiar and straightforward format.
What was your approach to developing this project?
Our approach involved delegation and splitting up tasks. We decided to split the task up into different areas that would suit our skills.
Georg created the front end, back end and architecture of the web app.
Ben used natural language processing and word embedding in order too create a way of comparing data sets and recommend new ones to the user.
Jourdain and Ollie made a dataset image search using machine learning with natural language processing techniques.
Ryan designed the UI, UX and graphic design of the web app.
Maddy worked with Ryan to create the brand name and logo using Canva. She created the presentation and project submission.
We felt this was the best approach because there were many different parts to our application, so by splitting the task up we were efficient and the tasks were done thoroughly by someone with expertise.
What tools, coding languages, hardware software did you use to develop your project?
Coding languages:
Python
R
Javascript
Software:
Canva
Node
React
What problems and achievements did your team have?
Problems included: collaboration between different coding languages and code compatibility.
The main problem we had is the limitations in retrieving relevant images using the titles of the datasets alone. Also challenging was the implementation of similarity scoring using word embedding and choosing from the different options available.
Achievements included: getting accurate image results, having accurate dataset recommendations.
How We Used Space Agency Data in This Project
How did we use space agency data in your project?
We used the central metadata repository in order to link the user with relevant datasets based on past choices.