Skip to content

wolfwdavid/vision-edge

Repository files navigation

Vision-Edge

Computer vision edge deployment pipeline: MobileNetV3 fine-tuning, TFLite conversion, and quantization benchmarks.

Architecture

Vision-Edge implements a lightweight object detection pipeline optimized for edge devices:

Input Image (320x320x3)
        |
  MobileNetV3Small (backbone)
        |
  +-----------+-----------+
  |                       |
Feature Map 1       Feature Map 2
(mid-level)         (high-level)
  |                       |
SSD Head 1          SSD Head 2
  |                       |
  +-----------+-----------+
        |
  Concatenated Predictions
  (class_scores + box_offsets)
        |
  Non-Maximum Suppression
        |
  Final Detections

MobileNetV3Small Backbone

The backbone uses tf.keras.applications.MobileNetV3Small pretrained on ImageNet. It extracts multi-scale feature maps at two spatial resolutions, providing both fine-grained and semantic features for detection.

Key features:

  • Width multiplier (alpha) for model size control
  • Optional backbone freezing for transfer learning
  • Lightweight inverted residual blocks with squeeze-and-excitation

SSD Detection Head

Each feature map feeds into a Single Shot Detector (SSD) head that predicts:

  • Class scores: num_anchors * num_classes per spatial location
  • Box offsets: num_anchors * 4 (dx, dy, dw, dh) per spatial location

The SSD loss combines:

  • Localization: Smooth L1 (Huber) loss on positive anchor box offsets
  • Classification: Cross-entropy with hard negative mining (neg:pos ratio = 3:1)

Quantization Pipeline

Three export variants are supported for deployment flexibility:

Variant Precision Typical Size Reduction Use Case
FP32 32-bit float 1x (baseline) Development, accuracy validation
FP16 16-bit float ~2x GPU-capable edge devices
INT8 8-bit integer ~4x Microcontrollers, mobile CPUs

Conversion Process

from src.export.tflite_converter import TFLiteExporter
from src.export.quantization import create_representative_dataset

exporter = TFLiteExporter(model)

# Export all variants
rep_dataset = create_representative_dataset(calibration_data, input_shape=(320, 320, 3))
results = exporter.convert_all("exported_models/", representative_dataset=rep_dataset)

INT8 quantization uses a representative dataset (100 samples by default) for activation range calibration, ensuring minimal accuracy degradation.

Benchmark Results

Example results on MobileNetV3-SSD (320x320, 10 classes):

Variant Size (MB) Latency (ms) mAP@0.5 Size Reduction Speedup
FP32 5.80 28.3 0.6820 1.0x 1.0x
FP16 3.10 22.1 0.6815 1.9x 1.3x
INT8 1.55 12.4 0.6680 3.7x 2.3x

Benchmarks measured on CPU with 50 inference runs and 5 warmup iterations.

Installation

pip install -e .

# With development dependencies
pip install -e ".[dev]"

Quick Start

Training

from src.model.mobilenet_ssd import MobileNetSSD
from src.model.losses import SSDLoss
from src.data.dataset import create_synthetic_dataset

# Build model
model = MobileNetSSD(input_size=320, num_classes=10)

# Create dataset
dataset = create_synthetic_dataset(num_samples=1000, image_size=320, num_classes=10)

# Train
loss_fn = SSDLoss(num_classes=10)
model.compile(optimizer="adam", loss=loss_fn)

Export

from src.export.tflite_converter import TFLiteExporter

exporter = TFLiteExporter(model)
exporter.convert_all("exported_models/")

Inference

from src.inference.tflite_engine import TFLiteEngine
from src.inference.nms import non_max_suppression

engine = TFLiteEngine("exported_models/model_fp32.tflite")
image = engine.preprocess(raw_image)
outputs = engine.predict(image)

# Post-process with NMS
boxes, scores, indices = non_max_suppression(
    outputs["boxes"], outputs["scores"], iou_threshold=0.45
)

Benchmarking

from src.benchmark.latency import measure_tflite_latency
from src.benchmark.report import generate_report

stats = measure_tflite_latency("exported_models/model_fp32.tflite")
print(f"Mean latency: {stats['mean_ms']:.1f} ms")

Project Structure

vision-edge/
+-- src/
|   +-- data/           # Data schemas, tf.data pipeline, augmentation
|   +-- model/          # MobileNetV3 backbone, SSD head, losses
|   +-- export/         # TFLite conversion, quantization, SavedModel
|   +-- benchmark/      # Latency, accuracy (mAP), report generation
|   +-- inference/      # TFLite engine, NMS post-processing
|   +-- deploy/         # Hugging Face Hub push utilities
|   +-- utils/          # Device config, visualization
+-- tests/              # Comprehensive test suite
+-- configs/            # YAML configuration files
+-- app.py              # Gradio demo application
+-- Dockerfile          # Container deployment

Testing

pytest tests/ -v

Docker

docker build -t vision-edge .
docker run -p 7860:7860 vision-edge

License

Apache-2.0

Releases

No releases published

Packages

 
 
 

Contributors