OCR API - Handwriting Text Recognition

A powerful REST API for Optical Character Recognition (OCR) specifically optimized for handwritten text extraction. Built with Go and powered by Microsoft's TrOCR (Transformer-based OCR) model running on CPU via ONNX Runtime.

Features

Handwriting Recognition: State-of-the-art accuracy for handwritten text using TrOCR
Image Preprocessing: Advanced preprocessing pipeline including:
- Noise reduction
- Contrast enhancement (CLAHE)
- Adaptive thresholding
- Automatic deskewing
- Smart polarity detection
Confidence Scores: Returns confidence levels for OCR results
Multiple Input Formats: Support for file upload and base64 encoded images
Batch Processing: Process multiple images in a single request
CPU Optimized: Runs efficiently on CPU without requiring GPU
RESTful API: Clean and easy-to-use REST API
Docker Support: Ready-to-deploy Docker container

Quick Start

Prerequisites

Go 1.21 or later
OpenCV 4.x
ONNX Runtime 1.16+
Python 3.8+ (for model download script)

Installation

Clone the repository:

git clone <repository-url>
cd OCR

Install dependencies:

make install-deps

Download TrOCR models:

make download-models

Run the server:

make run

The API will be available at http://localhost:8080

Using Docker

The easiest way to run the API is using Docker:

# Build and run with Docker Compose
make docker-build
make docker-run

# Or manually
docker build -t ocr-api .
docker run -p 8080:8080 -v $(pwd)/models:/app/models ocr-api

API Documentation

Endpoints

Health Check

GET /health

Returns API health status and model information.

OCR with File Upload

POST /api/v1/ocr
Content-Type: multipart/form-data

Parameters:
- image: Image file (required)
- enable_deskew: Enable automatic deskewing (default: true)
- return_processed: Return preprocessed image (default: false)

Example:

curl -X POST \
  -F "image=@handwritten.jpg" \
  -F "enable_deskew=true" \
  http://localhost:8080/api/v1/ocr

OCR with JSON/Base64

POST /api/v1/ocr/json
Content-Type: application/json

{
  "image": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
  "enable_deskew": true,
  "return_processed": false
}

Example:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"image": "data:image/jpeg;base64,...", "enable_deskew": true}' \
  http://localhost:8080/api/v1/ocr/json

Batch OCR

POST /api/v1/ocr/batch
Content-Type: multipart/form-data

Parameters:
- images: Multiple image files (up to 10)

Example:

curl -X POST \
  -F "images=@image1.jpg" \
  -F "images=@image2.jpg" \
  -F "images=@image3.jpg" \
  http://localhost:8080/api/v1/ocr/batch

Model Information

GET /api/v1/model/info

Returns information about the loaded TrOCR model.

Response Format

Success response:

{
  "success": true,
  "result": {
    "text": "Recognized text from the image",
    "confidence": 0.95,
    "language": "en"
  },
  "processing_time_ms": 234.56
}

Error response:

{
  "success": false,
  "error": "Error description",
  "processing_time_ms": 12.34
}

Project Structure

OCR/
├── cmd/
│   └── server/           # Main application entry point
│       └── main.go
├── internal/
│   ├── ocr/             # TrOCR engine and inference
│   │   └── trocr.go
│   ├── preprocessing/   # Image preprocessing
│   │   └── image.go
│   └── models/          # Model definitions
├── pkg/
│   └── api/             # API handlers and routes
│       ├── handlers.go
│       └── routes.go
├── configs/             # Configuration files
├── scripts/             # Setup and utility scripts
│   ├── download_models.sh
│   └── test_api.sh
├── examples/            # Example clients
│   └── client.py
├── models/              # TrOCR model files (gitignored)
├── Dockerfile
├── docker-compose.yml
├── Makefile
└── README.md

Configuration

Configuration is done via environment variables:

# Server
PORT=8080                    # Server port
GIN_MODE=release            # Gin mode (debug/release)

# Model
MODEL_PATH=./models         # Path to model files

# Processing
MAX_UPLOAD_SIZE=10485760    # Max upload size (10MB)
MAX_BATCH_SIZE=10           # Max batch size

Create a .env file from the example:

cp .env.example .env

Development

Running Tests

make test

Testing the API

# Basic API test
./scripts/test_api.sh

# Test with an image
TEST_IMAGE=./sample.jpg ./scripts/test_api.sh

Using the Python Client

python3 examples/client.py path/to/image.jpg

Model Information

This API uses Microsoft TrOCR (Transformer-based OCR), specifically the trocr-base-handwritten variant which is optimized for handwritten text recognition.

Model Details

Base Model: microsoft/trocr-base-handwritten
Architecture: Vision Transformer (ViT) encoder + Transformer decoder
Input Size: 384x384 pixels
Runtime: ONNX Runtime (CPU optimized)

Downloading Models

The models are downloaded automatically using the setup script:

./scripts/download_models.sh

This script:

Downloads the TrOCR model from Hugging Face
Converts it to ONNX format for CPU inference
Saves the models to the ./models directory

Models are approximately 500MB in size.

Performance

Performance metrics on standard hardware (4-core CPU):

Single image processing: ~200-300ms
Batch processing (5 images): ~800-1000ms
Memory usage: ~500MB-1GB (depending on batch size)

Supported Image Formats

JPEG / JPG
PNG
BMP
TIFF / TIF

Limitations

Maximum image upload size: 10MB
Maximum batch size: 10 images
CPU-only inference (no GPU acceleration in current version)
English language focus (can be extended to other languages)

Troubleshooting

Models not loading

Ensure model files are present:

ls -lh models/
# Should show: encoder_model.onnx, decoder_model.onnx

OpenCV errors

Install OpenCV development libraries:

# Ubuntu/Debian
sudo apt-get install libopencv-dev

# macOS
brew install opencv

Go build errors

Ensure CGO is enabled:

export CGO_ENABLED=1
go build ./cmd/server

License

MIT License

Acknowledgments

Microsoft TrOCR - Transformer-based OCR model
ONNX Runtime - Cross-platform ML inference
GoCV - Go bindings for OpenCV
Gin - Web framework

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
cmd/server		cmd/server
internal/ocr		internal/ocr
pkg/api		pkg/api
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

OCR API - Handwriting Text Recognition

Features

Quick Start

Prerequisites

Installation

Using Docker

API Documentation

Endpoints

Health Check

OCR with File Upload

OCR with JSON/Base64

Batch OCR

Model Information

Response Format

Project Structure

Configuration

Development

Running Tests

Testing the API

Using the Python Client

Model Information

Model Details

Downloading Models

Performance

Supported Image Formats

Limitations

Troubleshooting

Models not loading

OpenCV errors

Go build errors

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages