Vision-Edge

Computer vision edge deployment pipeline: MobileNetV3 fine-tuning, TFLite conversion, and quantization benchmarks.

Architecture

Vision-Edge implements a lightweight object detection pipeline optimized for edge devices:

Input Image (320x320x3)
        |
  MobileNetV3Small (backbone)
        |
  +-----------+-----------+
  |                       |
Feature Map 1       Feature Map 2
(mid-level)         (high-level)
  |                       |
SSD Head 1          SSD Head 2
  |                       |
  +-----------+-----------+
        |
  Concatenated Predictions
  (class_scores + box_offsets)
        |
  Non-Maximum Suppression
        |
  Final Detections

MobileNetV3Small Backbone

The backbone uses tf.keras.applications.MobileNetV3Small pretrained on ImageNet. It extracts multi-scale feature maps at two spatial resolutions, providing both fine-grained and semantic features for detection.

Key features:

Width multiplier (alpha) for model size control
Optional backbone freezing for transfer learning
Lightweight inverted residual blocks with squeeze-and-excitation

SSD Detection Head

Each feature map feeds into a Single Shot Detector (SSD) head that predicts:

Class scores: num_anchors * num_classes per spatial location
Box offsets: num_anchors * 4 (dx, dy, dw, dh) per spatial location

The SSD loss combines:

Localization: Smooth L1 (Huber) loss on positive anchor box offsets
Classification: Cross-entropy with hard negative mining (neg:pos ratio = 3:1)

Quantization Pipeline

Three export variants are supported for deployment flexibility:

Variant	Precision	Typical Size Reduction	Use Case
FP32	32-bit float	1x (baseline)	Development, accuracy validation
FP16	16-bit float	~2x	GPU-capable edge devices
INT8	8-bit integer	~4x	Microcontrollers, mobile CPUs

Conversion Process

from src.export.tflite_converter import TFLiteExporter
from src.export.quantization import create_representative_dataset

exporter = TFLiteExporter(model)

# Export all variants
rep_dataset = create_representative_dataset(calibration_data, input_shape=(320, 320, 3))
results = exporter.convert_all("exported_models/", representative_dataset=rep_dataset)

INT8 quantization uses a representative dataset (100 samples by default) for activation range calibration, ensuring minimal accuracy degradation.

Benchmark Results

Example results on MobileNetV3-SSD (320x320, 10 classes):

Variant	Size (MB)	Latency (ms)	mAP@0.5	Size Reduction	Speedup
FP32	5.80	28.3	0.6820	1.0x	1.0x
FP16	3.10	22.1	0.6815	1.9x	1.3x
INT8	1.55	12.4	0.6680	3.7x	2.3x

Benchmarks measured on CPU with 50 inference runs and 5 warmup iterations.

Installation

pip install -e .

# With development dependencies
pip install -e ".[dev]"

Quick Start

Training

from src.model.mobilenet_ssd import MobileNetSSD
from src.model.losses import SSDLoss
from src.data.dataset import create_synthetic_dataset

# Build model
model = MobileNetSSD(input_size=320, num_classes=10)

# Create dataset
dataset = create_synthetic_dataset(num_samples=1000, image_size=320, num_classes=10)

# Train
loss_fn = SSDLoss(num_classes=10)
model.compile(optimizer="adam", loss=loss_fn)

Export

from src.export.tflite_converter import TFLiteExporter

exporter = TFLiteExporter(model)
exporter.convert_all("exported_models/")

Inference

from src.inference.tflite_engine import TFLiteEngine
from src.inference.nms import non_max_suppression

engine = TFLiteEngine("exported_models/model_fp32.tflite")
image = engine.preprocess(raw_image)
outputs = engine.predict(image)

# Post-process with NMS
boxes, scores, indices = non_max_suppression(
    outputs["boxes"], outputs["scores"], iou_threshold=0.45
)

Benchmarking

from src.benchmark.latency import measure_tflite_latency
from src.benchmark.report import generate_report

stats = measure_tflite_latency("exported_models/model_fp32.tflite")
print(f"Mean latency: {stats['mean_ms']:.1f} ms")

Project Structure

vision-edge/
+-- src/
|   +-- data/           # Data schemas, tf.data pipeline, augmentation
|   +-- model/          # MobileNetV3 backbone, SSD head, losses
|   +-- export/         # TFLite conversion, quantization, SavedModel
|   +-- benchmark/      # Latency, accuracy (mAP), report generation
|   +-- inference/      # TFLite engine, NMS post-processing
|   +-- deploy/         # Hugging Face Hub push utilities
|   +-- utils/          # Device config, visualization
+-- tests/              # Comprehensive test suite
+-- configs/            # YAML configuration files
+-- app.py              # Gradio demo application
+-- Dockerfile          # Container deployment

Testing

pytest tests/ -v

Docker

docker build -t vision-edge .
docker run -p 7860:7860 vision-edge

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
configs		configs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Edge

Architecture

MobileNetV3Small Backbone

SSD Detection Head

Quantization Pipeline

Conversion Process

Benchmark Results

Installation

Quick Start

Training

Export

Inference

Benchmarking

Project Structure

Testing

Docker

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision-Edge

Architecture

MobileNetV3Small Backbone

SSD Detection Head

Quantization Pipeline

Conversion Process

Benchmark Results

Installation

Quick Start

Training

Export

Inference

Benchmarking

Project Structure

Testing

Docker

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages