iSpeakFree.ly| Better Together

Better Together

Your challenge is to create a tool, app, or resource that helps close a gap that causes people to experience inequality. This combination of humanity and technology should eliminate or lessen a systemic issue and educate the user so they can grow.

iSpeakFree.ly - Your Pocket Interpreter

Summary

The solution is a mobile application that translates sign language to speech and vice versa, It is designed to facilitate bi-directional conversation for hearing and speech impaired community. To use this app the camera will point towards the person performing signs, the camera will then record the gestures and key points of hand and facial expression while he is performing the signs which will then be translated to text in real-time. Similarly, when the other person responds the speech will be captured and corresponding signs will be performed by our 3D character.

How We Addressed This Challenge

According to survey reports, there are 466 million people globally and around 2 million people in India who belong to the hearing and speech impaired community out of which 11% are children. This is one of the most ignored communities, especially in India, and they find it very difficult to communicate well with others. The best way they could communicate with someone is to hire a sign language expert who will translate sign language to speech and vice versa. But its drawbacks are that one will have to pay the expert on an hourly basis and it is not feasible for the expert to be present physically always. To remove the social barrier, we came up with the solution of developing a mobile application that will convert the sign language to speech and vice versa in real-time. Our main goal is to enable a complete bi-directional communication and remove all and any physical dependencies on the people belonging to this community. The entire application consists of 2 parts:


  1. Sign Language to Speech.
  2. Speech to Sign Language.

For the first part the users simply needs to point the camera towards themselves and the application will then record the gestures and using advanced Deep Learning algorithms and convert it to the respective speech. Once the hearing person says something, it will be recorded by the phone and the corresponding signs will be performed, word by word, by our 3D character.

Our main goal is to bridge the social gap between the hearing and speech impaired community and the hearing people, by providing an application, combining humanity and technology, to empower this community so they can interact with the world freely and independently because alone we can do so little; together we can do so much. We are team iSpeakFree.ly and we want you to speak freely!

How We Developed This Project

As we are all technology enthusiasts, inspired by the technological advances made in deep learning and NLP, it drew us to solve the communication problems for the hearing and speech impaired community.


Our complete solution consists of two parts:


Part 1: Sign Language to Speech:

This first part consists of two algorithms, First algorithm is used for feature extraction based on an open source tool "Mediapipe", that takes an image as an input, extracts the key-points of hands, and gives coordinates as its output. 

Furthermore, these coordinates are fed to the second algorithm that is a transformer-based neural network built using Tensorflow, and gives the corresponding translated text as output which is then converted to speech using the text to speech API.


Part 2: Speech to Sign Language:

Similarly, when the hearing person responds using his speech, his speech gets recorded and it is then converted to text using a speech to text library. This text is then queried from our database which contains sign language animations corresponding to a single word which is then signed by the 3D model. This part uses Blender for making and animating the 3D character and Unity for playing these animations.


Initially one of our biggest problems was feature extraction from the image of the user performing sign language. We started with a simple image processing based approach using openCV to classify gestures which were letters but this approach was not suitable as it didn't support signs of words and was vulnerable to noise. To overcome this issue we decided to use Mediapipe, which is an open source tool, to extract the coordinates of key-points of the hand and then classified the gestures based on these coordinates. As sign language is versatile and different for each region, it was difficult to procure a dataset of the same. Right now our model is based on American Sign Language but the architecture is such that we can incorporate different sign languages into our model just by changing the dataset.

How We Used Space Agency Data in This Project

We haven't used any Space Agency data, as our problem is of a wider horizon and centric around the hearing and speech impaired community of the world. The current data is of American Sign Language and the goal is to eliminate their issue combining humanity and technology.

Project Demo

https://drive.google.com/file/d/16SpBbsRCMtSqfDOAUB7RURR6ILF8Z3xC/view

Data & Resources

Dataset: https://github.com/dxli94/WLASL


Mediapipe: https://github.com/google/mediapipe


Audio to Sign: https://github.com/sahilkhoslaa/AudioToSignLanguageConverter

Tags
#artificial_intelligence, #social_impact, #disability, #healthcare, #accessibility, #better_together.
Judging
This project was submitted for consideration during the Space Apps Judging process.