This project uses deep learning to recognize American Sign Language (ASL) alphabet gestures from real-time webcam feed. It leverages MediaPipe for hand tracking, OpenCV for capturing images, and a Keras LSTM model to learn gesture sequences.
- 🖐️ Real-time hand gesture recognition using webcam
- 🧠 Deep learning model (LSTM) trained on 25 static ASL letters (A–Z excluding J)
- 💡 Feedback overlay showing predicted letter and confidence
- 📁 Easy-to-use scripts for data collection, keypoint extraction, model training, and deployment
SignLanguageDetectionUsingML/
├── Image/ # Raw gesture images captured via webcam
├── MP_Data/ # Extracted MediaPipe keypoints (.npy files)
├── Logs/ # TensorBoard logs
├── model.h5 # Trained LSTM model weights
├── model.json # Model architecture (optional)
├── app.py # Real-time sign detection and prediction
├── collectiondata.py # Image collection script
├── data.py # Keypoint extraction from images
├── train_model.py # LSTM model training
├── modelsummary.py # To view model structure
├── function.py # Shared utility functions (MediaPipe logic)
├── requirements.txt # Required Python packages
└── README.md # You're here!
git clone https://github.com/Shifaliii/ASL_using_LSTM.git
cd ASL_using_LSTMpython -m venv venv
venv\Scripts\activate # On Windowspip install -r requirements.txtRun the script below to collect images for each alphabet gesture using your webcam:
python collectiondata.py- Press keys like
a,b,c, ... to save corresponding gesture images. - Images are saved in
Image/A/,Image/B/, etc.
Convert images to hand landmark keypoints using MediaPipe:
python data.py- Saves 21 keypoints per frame as
.npyfiles insideMP_Data/.
Train an LSTM model on the extracted sequences:
python train_model.py- Trains a 3-layer LSTM with dense layers
- Saves model as
model.h5andmodel.json - Logs progress to
Logs/for TensorBoard
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(30,63)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(25, activation='softmax')) # 25 signs (J excluded)- ✅ Accuracy after 200 epochs: ~97–99%
- 📉 Loss steadily decreased across epochs
- 📂 You can visualize training with:
tensorboard --logdir Logs/
Run the main app to detect ASL gestures live:
python app.py- Opens webcam and shows prediction + accuracy
- Only works for the 25 trained letters (J excluded)
| Problem | Solution |
|---|---|
| Webcam not opening | Make sure it's not in use by another app |
| MediaPipe errors | Use Python 3.10 with mediapipe==0.10.14 |
Permission error on venv\Scripts\activate |
Run PowerShell as Admin:Set-ExecutionPolicy RemoteSigned |
| Model not detecting well | Collect more diverse images; train longer |
These are available in requirements.txt:
opencv-python==4.12.0.88
mediapipe==0.10.14
tensorflow==2.11.0
keras==2.11.0
numpy==1.23.5
protobuf==3.20.3Pull requests are welcome.
For major changes, please open an issue first to discuss your ideas.
Shifali Florine Lobo
GitHub