A powerful REST API for Optical Character Recognition (OCR) specifically optimized for handwritten text extraction. Built with Go and powered by Microsoft's TrOCR (Transformer-based OCR) model running on CPU via ONNX Runtime.
- Handwriting Recognition: State-of-the-art accuracy for handwritten text using TrOCR
- Image Preprocessing: Advanced preprocessing pipeline including:
- Noise reduction
- Contrast enhancement (CLAHE)
- Adaptive thresholding
- Automatic deskewing
- Smart polarity detection
- Confidence Scores: Returns confidence levels for OCR results
- Multiple Input Formats: Support for file upload and base64 encoded images
- Batch Processing: Process multiple images in a single request
- CPU Optimized: Runs efficiently on CPU without requiring GPU
- RESTful API: Clean and easy-to-use REST API
- Docker Support: Ready-to-deploy Docker container
- Go 1.21 or later
- OpenCV 4.x
- ONNX Runtime 1.16+
- Python 3.8+ (for model download script)
- Clone the repository:
git clone <repository-url>
cd OCR- Install dependencies:
make install-deps- Download TrOCR models:
make download-models- Run the server:
make runThe API will be available at http://localhost:8080
The easiest way to run the API is using Docker:
# Build and run with Docker Compose
make docker-build
make docker-run
# Or manually
docker build -t ocr-api .
docker run -p 8080:8080 -v $(pwd)/models:/app/models ocr-apiGET /healthReturns API health status and model information.
POST /api/v1/ocr
Content-Type: multipart/form-data
Parameters:
- image: Image file (required)
- enable_deskew: Enable automatic deskewing (default: true)
- return_processed: Return preprocessed image (default: false)Example:
curl -X POST \
-F "image=@handwritten.jpg" \
-F "enable_deskew=true" \
http://localhost:8080/api/v1/ocrPOST /api/v1/ocr/json
Content-Type: application/json
{
"image": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
"enable_deskew": true,
"return_processed": false
}Example:
curl -X POST \
-H "Content-Type: application/json" \
-d '{"image": "data:image/jpeg;base64,...", "enable_deskew": true}' \
http://localhost:8080/api/v1/ocr/jsonPOST /api/v1/ocr/batch
Content-Type: multipart/form-data
Parameters:
- images: Multiple image files (up to 10)Example:
curl -X POST \
-F "images=@image1.jpg" \
-F "images=@image2.jpg" \
-F "images=@image3.jpg" \
http://localhost:8080/api/v1/ocr/batchGET /api/v1/model/infoReturns information about the loaded TrOCR model.
Success response:
{
"success": true,
"result": {
"text": "Recognized text from the image",
"confidence": 0.95,
"language": "en"
},
"processing_time_ms": 234.56
}Error response:
{
"success": false,
"error": "Error description",
"processing_time_ms": 12.34
}OCR/
├── cmd/
│ └── server/ # Main application entry point
│ └── main.go
├── internal/
│ ├── ocr/ # TrOCR engine and inference
│ │ └── trocr.go
│ ├── preprocessing/ # Image preprocessing
│ │ └── image.go
│ └── models/ # Model definitions
├── pkg/
│ └── api/ # API handlers and routes
│ ├── handlers.go
│ └── routes.go
├── configs/ # Configuration files
├── scripts/ # Setup and utility scripts
│ ├── download_models.sh
│ └── test_api.sh
├── examples/ # Example clients
│ └── client.py
├── models/ # TrOCR model files (gitignored)
├── Dockerfile
├── docker-compose.yml
├── Makefile
└── README.md
Configuration is done via environment variables:
# Server
PORT=8080 # Server port
GIN_MODE=release # Gin mode (debug/release)
# Model
MODEL_PATH=./models # Path to model files
# Processing
MAX_UPLOAD_SIZE=10485760 # Max upload size (10MB)
MAX_BATCH_SIZE=10 # Max batch sizeCreate a .env file from the example:
cp .env.example .envmake test# Basic API test
./scripts/test_api.sh
# Test with an image
TEST_IMAGE=./sample.jpg ./scripts/test_api.shpython3 examples/client.py path/to/image.jpgThis API uses Microsoft TrOCR (Transformer-based OCR), specifically the trocr-base-handwritten variant which is optimized for handwritten text recognition.
- Base Model: microsoft/trocr-base-handwritten
- Architecture: Vision Transformer (ViT) encoder + Transformer decoder
- Input Size: 384x384 pixels
- Runtime: ONNX Runtime (CPU optimized)
The models are downloaded automatically using the setup script:
./scripts/download_models.shThis script:
- Downloads the TrOCR model from Hugging Face
- Converts it to ONNX format for CPU inference
- Saves the models to the
./modelsdirectory
Models are approximately 500MB in size.
Performance metrics on standard hardware (4-core CPU):
- Single image processing: ~200-300ms
- Batch processing (5 images): ~800-1000ms
- Memory usage: ~500MB-1GB (depending on batch size)
- JPEG / JPG
- PNG
- BMP
- TIFF / TIF
- Maximum image upload size: 10MB
- Maximum batch size: 10 images
- CPU-only inference (no GPU acceleration in current version)
- English language focus (can be extended to other languages)
Ensure model files are present:
ls -lh models/
# Should show: encoder_model.onnx, decoder_model.onnxInstall OpenCV development libraries:
# Ubuntu/Debian
sudo apt-get install libopencv-dev
# macOS
brew install opencvEnsure CGO is enabled:
export CGO_ENABLED=1
go build ./cmd/server- Microsoft TrOCR - Transformer-based OCR model
- ONNX Runtime - Cross-platform ML inference
- GoCV - Go bindings for OpenCV
- Gin - Web framework