A real-time sign language recognition system that converts hand gestures into text and speech using computer vision and machine learning.
This project focuses on improving accessibility by enabling basic communication through sign language recognition. It captures hand gestures using a webcam, processes them using MediaPipe, and predicts corresponding characters using a trained machine learning model.
The system is designed to work in real time and supports continuous sentence formation along with speech output.
This project aims to bridge the communication gap between individuals using sign language and those unfamiliar with it.
By integrating real-time gesture recognition with speech output, the system enables more natural and accessible interaction, with potential applications and a raw idea in assistive technology for the hearing and speech impaired.
The dataset used in this project was created manually by capturing hand gesture samples using a webcam.
- Custom dataset built for multiple hand gestures
- Landmark data extracted using MediaPipe
- Data stored and organized for training the ML model
This approach allowed better control over the data and helped in understanding how data quality affects model performance.
- Real-time hand gesture recognition
- Text generation from sign language
- Word suggestion system
- Text-to-speech output
- Two-hand gesture commands (space, clear, delete, speak)
Python
OpenCV
MediaPipe
Scikit-learn
NumPy
pyttsx3
dataset/ → Gesture datasets
src/ → Source code
model/ → Trained ML model
predict_sign.py → Main application
- MediaPipe extracts hand landmarks from webcam input
- Landmark data is processed and passed to a trained ML model
- The model predicts the corresponding character
- Characters are combined to form words and sentences
- A simple suggestion system assists in completing words
- Final text can be converted into speech output
Install dependencies:
pip install -r requirements.txt
Run the program:
python src/predict_sign.py
SPACE → Separate words
CLEAR → Clear sentence
DELETE → Remove last letter
SPEAK → Convert sentence to speech
-
Limited gesture vocabulary (not full A–Z)
-
Performance depends on lighting and hand visibility
-
Future improvements:
- Expand gesture dataset
- Improve model accuracy
- Integrate deep learning models
- Build a GUI for better usability
Real-time gesture recognition, text formation, and command execution:
Example frame from the system:

