Voco

This is a hackathon project developed during TartanHacks 2018 at Carnegie Mellon University. The team members are Bruce Liu, Justin Chu, Qingyi Dong, and Jacky Zou. We won the Best Social Impact Award sponsored by APT.

Inspiration

Speech and hearing impaired people communicate with each other through sign languages. But how could a person who does not know sign language talk to them? There are several solutions nowadays, for example, type the conversation into text, or use computer vision to recognize sign language... I noticed that those current solutions are either not efficient or hard to use. Thus, I came up with this idea of using a wearable device to recognize and translate sign language. 

What it does

Voco is an assistive app that can translate sign languages into words using data from Myo, a bluetooth-connected wearable armband. We built a classifier using an SVM (support vector machine) model implemented with libsvm. This prototype was able to distinguish 6 words (HELLO, MY, NAME, TARTAN, HACKS, WHATSUP). Speech recognition was implemented using Google's Speech Recognition API to help people who don't sign communicate better with hearing and speech impaired people.

How we built it

1. Collecting data:

There are 3 kinds of raw data Myo armband provides: Gyro, accelerometer and EMG of muscle activities. For sign language recognition, Gyro data provide us with a general path of one's arm movement, EMG data is used to detect finger activities (since finger movements are connected to muscles), and accelerometer let us know when the sign starts and ends. We repeated each sign for 20 times, and collected in total data of 6 words ("tartan", "hello", "my", "name", "is", "what's up"). 

2. Data processing:

The first problem we ran into is how to standardize the data. Since people may do the same sign in different speeds, we cannot simply use the timestamps as features. Thus we interpolated each set of data into a fixed range. In addition, it is also hard to determine when the sign starts and ends. We did not find an efficient way to discriminate a sign from another when doing consecutive movements, thus we regulate that one has to put his/her hand down at the start/end of each sign. 

3. Training (SVM model)

We used LIBSVM, a library for SVM to train our data. Specifically we used the 1 vs. ALL strategy and created 6 different classifiers, each with RBF kernel. Our error rate on training data set is around 0.05 and on test data set 0.06~0.07. 

4. Application

We built a prototype that integrated both "sign to text" (as described above) and "speech to text" (with Google Cloud speech recognition API) features, so that people can communicate with people with disability fluently.

Accomplishments 

Our data processing and training is successful. The error rate is low on both train and test data, and there is no significant overfitting or underfitting. And in realtime test (as in the video below), most of the time it gives the correct result.

We won the Best Social Impact Award in tartanHacks. (1 out of 65 submissions, 9 winners in total).

What's next for Voco

1. Extend the dictionary.

2. Integrate some Natural Language Processing method so that even if it classifies the word wrong, the app can still give the correct meaning by evaluating its context.

3. Explore the possibility for continuous movements.