Data Discovery for Earth Science

Websites like the NASA Earth Observatory showcase the many uses of satellite data to highlight interesting natural events. International partner instruments on NASA satellites such as Japan’s ASTER instrument and Canada’s MOPITT instrument, both onboard the Terra satellite, are also included as part of the Observatory. This challenge will ask you to devise a tool or technique to guide users to relevant datasets to study specific events.

Dataset Recommendation and Bookmarklet on Earth Observatory

Summary

Currently Earth Observatory has no dataset recommendation to readers for their deeper research on the event. The readers’ last resort to look for possible dataset related to the article’s subject on other web resources like NASA ApppEEARS, EarthData, WorldView, and other metadata from NASA’s repositories. Since it takes a tremendous amount of time for readers to look for digital resources on NASA DB, we devise a way to help readers who try to find more resources related to articles on Earth Observatory by providing dataset recommendation list and easy route to the desired dataset and guide them through the beginning of their research.

How We Addressed This Challenge

The challenge here is to help researcher or any other people who is interested in topics in articles posted on Earth Observatory find possible dataset they want to look for and provide the readers with a separate dataset recommendation list on the bottom of an article. Though our service, general readers.


We develop a built-in service that NASA can implement on their Earth Observatory. When used on a webpage, it will search through the context and extracts topics and keywords that researcher would use for their research. For the searching, the program would use elasticsearch and score the keywords through tf-idf. Top 10 keywords with the most scores will be extracted from the context.


In scoring the keywords, the program would also search on other web contents such as Nature, Science news journal to collect recently raised topics and match them with the keywords generated from the Earth Observatory article. Then, the words that have the most matches get higher scores.


Then, the final list of keywords would be put into Aleph(search api) and check if there are any dataset that can be linked with the keywords. Through internal Aleph search filtering, the most relevant dataset is provided for each keywords.


The keywords would contain both link to download the dataset and description of the dataset. Moreover, if there is no dataset that matches with the keywords, these keywords would then be listed on a separate field within the article as a list of possible search recommendation.


The keywords would be included within the context as hyperlink format so if the readers click the words inside the context, they can directly visit the link to download the specific dataset. The recommendation list will be included on the bottom of the article, so readers can look into the topics on their discretion.


Project Web page: https://hanbba92.github.io/spaceapps2020openingsky/

How We Developed This Project

Huge amount of dataset inside NASA archive as well as EarthData webpage are not easy to handle without prior experience in earth science research. For the general citizens, they are more likely to read articles from Earth Observatory and find their interests in some topics and further develop their inquiries in NASA’s repositories. Earth Observatory is a great archive for general readers to learn and experience NASA’s project and research. Therefore, we wanted to provide more comprehensive tool for the readers on the page to help them navigate through their journey into NASA datasets.


We decided to provide bookmarklet for each article page, so readers don’t have to open any other application in order to search on research keywords provided in the article. In this way, it would be the most efficient and user-friendly way to interact with the readers.


Tools we used include Apache airflow(in order to create a pipeline that process the context on each article), Aleph(open source search engine that can filter down the most relevant dataset within NASA repositories).


There were difficulties how to implement the program inside the NASA Earth Observatory page. We don’t want to add any more pages, but integrate the recommendation list inside the existing pages. Additional front-end work should be done by web-page editors who can add our program features inside the existing web page.

How We Used Space Agency Data in This Project

We mainly used APPeears, EarthData, Earth Observatory resources. We looked into the structure of dataset provided within each platform in order to map keywords within the articles that can generate the most relevant and useful dataset for the readers. The readers might need to download additional tools such as QGIS, Panoply and be able to use python in order to access some dataset, but the recommended dataset would also include links to WorldView where they can see visualization of satellite data related to the article.

Project Demo

https://docs.google.com/presentation/d/1xrEUtLIbPiWUHO7yAllowIFhQxV9WHbSRBPp_O39mm4/edit?usp=sharing

Data & Resources

EARTHDATA, AppEEARS, WORDVIEW, ALEPH, LP DAAC, TERRA

Tags
#Dataset #Recommendation #Data Access
Judging
This project was submitted for consideration during the Space Apps Judging process.