All-in-one desktop toolkit for AI model inference, evaluation, analysis, data management, training & deployment
| Feature | What it does | |
|---|---|---|
| π¬ | Real-time Inference Viewer? | Load an ONNX model, open a video or image, see detections/classifications live |
| π | Multi-Model Evaluation? | Compare multiple models side-by-side with mAP, Precision, Recall, F1 |
| π¬ | Inference Analysis? | Inspect letterbox, tensor heatmap, and detection results on a single image |
| βοΈ | Model A/B Compare? | Run two models on the same images, navigate with a slider |
| π― | FP/FN Error Analysis? | Auto-classify false positives & negatives by size (S/M/L) and position |
| π | Confidence Optimizer? | Sweep thresholds per class, find the F1-maximizing confidence with PR curves |
| πΊοΈ | Embedding Visualization? | t-SNE / UMAP / PCA 2D scatter plots from any feature extractor |
| β‘ | Benchmark? | Measure FPS, latency (P50/P95/P99), CPU/GPU usage with system info export |
| πΌοΈ | Segmentation Evaluation? | mIoU, mDice, per-class IoU/Dice against GT masks |
| π€ | CLIP Zero-Shot? | Load image + text encoders, evaluate zero-shot classification |
| π§ | Vision-Language (VLM)? | Caption & VQA via CLIP, local Qwen-VL (transformers), or OpenAI-compatible API |
| π§² | Embedder Evaluation? | Retrieval@1/@K, cosine similarity, multi-image comparison |
| π | Dataset Explorer? | Gallery with multi-class filter, box filter, class/size/aspect distribution charts |
| βοΈ | Dataset Splitter? | Random or stratified train/val/test split with progress tracking |
| π | Format Converter? | YOLO β COCO JSON β Pascal VOC XML batch conversion |
| π·οΈ | Class Remapper? | Remap, merge, or delete class IDs in bulk |
| π | Dataset Merger? | Combine datasets with dHash duplicate detection |
| π | Smart Sampler? | Balanced (equal per-class + diversity), Random, Stratified sampling |
| π‘οΈ | Label Anomaly Detector? | Find OOB boxes, size outliers, excessive overlaps |
| πΌοΈ | Image Quality Checker? | Detect blur, brightness issues, overexposure, abnormal aspect ratios |
| π― | Near-Duplicate Detector? | dHash perceptual hashing with configurable threshold |
| π | Leaky Split Detector? | Cross-split (train/val/test) duplicate detection |
| π | Similarity Search? | Query any image β top-K most similar results |
| π¨ | Augmentation Preview? | Mosaic, flip, rotate, Albumentations β preview before applying |
| ποΈ | Train a Model? | YOLO detect/segment/pose/classify + timm/torchvision classifiers, live per-epoch metrics (docs) |
| π | Export / Deploy? | .pt/.onnx β onnxruntime (CPU/CUDA/DirectML) / TensorRT / OpenVINO / CoreML / TorchScript with configurable opset/batch/input/precision (docs) |
All in one window. No code required.
Training & export frameworks (torch / ultralytics / openvino / β¦) are optional β install with
pip install -r requirements-train.txt. Unavailable trainers/targets are greyed-out with an explanation; the ONNX-runtime-only core never breaks.
| Task | Model Format | Metrics |
|---|---|---|
| Detection | YOLO v5/v8/v9/v11, CenterNet (Darknet), Custom ONNX | mAP@50, mAP@50:95, P/R/F1 |
| Classification | ONNX (2D output) | Accuracy, per-class P/R/F1 |
| Segmentation | ONNX (CΓHΓW output) | mIoU, mDice, per-class IoU/Dice |
| VLM / CLIP | CLIP ONNX, local transformers (Qwen-VL), or OpenAI-compatible API | Zero-shot Classification, Captioning, VQA |
| Embedder | ONNX (feature extractor) | Retrieval@1/@K, Cosine Similarity |
Fixed-batch models (e.g., batch=4) are automatically detected and handled.
The VLM tab supports three pluggable backends. Pick one in the Backend dropdown:
| Backend | What it runs | Tasks | Setup |
|---|---|---|---|
| clip (default) | CLIP image + text encoder ONNX | Zero-shot classification, template-based caption / VQA | None β works out of the box, no extra deps |
| transformers | Local generative VLM (e.g. Qwen2.5-VL) via π€ transformers | Captioning, VQA | pip install -r requirements-vlm.txt (CUDA build of torch for GPU) |
| openai | Any OpenAI-compatible chat endpoint β Ollama, vLLM, LM Studio, etc. | Captioning, VQA | pip install httpx (or requirements-vlm.txt); set endpoint URL + optional API key |
- clip is fully self-contained: just supply the image and text encoder ONNX files.
- transformers downloads / loads a HuggingFace image-text-to-text model by ID or local path and runs it locally (GPU auto-detected via CUDA).
- openai sends frames as base64 JPEG to
{endpoint_url}/chat/completions. Point it at a local server (http://localhost:11434/v1for Ollama,http://localhost:8000/v1for vLLM) or a remote one with a Bearer API key.
# Enable the transformers + OpenAI-compatible backends (CLIP needs nothing):
pip install -r requirements-vlm.txtDownload the latest release from Releases:
- Windows:
.msiinstaller or.zipportable - macOS:
.dmgdisk image
Just run β no Python needed.
Requires Python 3.10+.
git clone https://github.com/surrealier/ssook.git
cd ssook
pip install -r requirements-web.txt
# Optional extras
pip install matplotlib scikit-learn openpyxl # charts & Excel export
pip install umap-learn # UMAP embedding
pip install pywebview # native desktop window
pip install onnxruntime-gpu # CUDA acceleration
pip install -r requirements-vlm.txt # transformers / OpenAI VLM backends
# EP venv 격리 μ€μΉ (GPU/DirectML/OpenVINO/CoreML λμ 곡쑴)
python scripts/setup_ep.py # νλ«νΌ μ 체 EP μ€μΉ
python scripts/setup_ep.py cuda cpu # νΉμ EPλ§ μ€μΉ
python scripts/setup_ep.py --status # μ€μΉ μν νμΈ
python run_web.py| Flag | Description |
|---|---|
--port 9000 |
Custom port (default: 8765) |
--browser |
Force browser mode instead of native window |
1. Launch β Settings tab β Download test models & sample data
2. Viewer tab β Open video/image β See real-time inference
3. Evaluation tab β Add models, set GT labels β Run evaluation
4. Analysis tab β Dive into FP/FN, confidence optimization, embeddings
5. Data tab β Explore, split, convert, clean your dataset
| Document | Topics |
|---|---|
| Model Optimization | Quantization (INT8, FP16, Mixed Precision), Pruning, Graph Optimization |
| Model Analysis | Model Diagnosis, Profiler, Inspector |
| Evaluation Metrics | mAP, IoU, P/R/F1, Confidence Optimizer, FP/FN Error Analysis |
| Embedding & CLIP | t-SNE / UMAP / PCA, CLIP Zero-Shot, Embedder Evaluation |
| Execution Providers | Auto EP Selection, venv Isolation, GPU Acceleration |
| Tracking & Sampling | ByteTrack / SORT, Smart Sampler, dHash Duplicate Detection |
| General Features | Viewer, Explorer, Splitter, Converter, Quality Tools, Benchmark |
Settings are stored in settings/app_config.yaml and persist across sessions:
model_type: yolo
conf_threshold: 0.25
batch_size: 1
box_thickness: 2
label_size: 0.55
show_labels: true
show_confidence: true| Package | Purpose |
|---|---|
| fastapi | Web backend |
| uvicorn | ASGI server |
| opencv-python | Image/video processing |
| numpy | Numerical operations |
| onnxruntime | ONNX model inference |
| psutil | System resource monitoring |
| PyYAML | Configuration management |
| Package | Purpose |
|---|---|
| pywebview | Native desktop window (instead of browser) |
| matplotlib | Charts, scatter plots, PR curves |
| scikit-learn | t-SNE, PCA dimensionality reduction |
| openpyxl | Excel report export |
| umap-learn | UMAP embedding visualization |
| onnxruntime-gpu | CUDA/TensorRT acceleration |
| transformers / torch / accelerate | Local generative VLM backend (Qwen-VL) β see requirements-vlm.txt |
| httpx | OpenAI-compatible VLM backend (Ollama / vLLM / LM Studio) |
python -m pytest tests/ -v- QC release: P0 crash fixes across viewer, evaluation, data, and analysis flows
- Security hardening: path-safety validation on all user-supplied paths (traversal prevention)
- 7 specialized tabs now reachable: CLIP Zero-Shot, Embedder, Segmentation, Tracking, VLM, and more are registered in the sidebar
- Pluggable VLM backends: choose clip (dependency-free), transformers (local Qwen-VL), or openai (OpenAI-compatible β Ollama / vLLM / LM Studio) β see VLM Backends
- EP venv Isolation: onnxruntime λ³μ’
λ³ λ
립 venv 격리 (
ep_venvs/) β GPU/DirectML/OpenVINO/CoreML/CPU λμ 곡쑴 - Auto EP Selection: νλ«νΌΒ·νλμ¨μ΄ κΈ°λ° μ΅μ Execution Provider μλ μ ν
- CoreML Support: macOS Apple Silicon CoreMLExecutionProvider μ§μ
- OpenVINO GPU-first: OpenVINO EPκ° Intel iGPU μ°μ μλ, λΆκ° μ OpenVINO CPU ν΄λ°±
- Cross-platform Setup:
python scripts/setup_ep.pyλ¨μΌ μ€ν¬λ¦½νΈλ‘ Windows/Linux/macOS EP μ€μΉ
- Bugfix: Fix Internal Server Error (index.html missing from build)
- Bugfix: Fix frozen exe path resolution (
sys._MEIPASS) - pywebview: Native desktop window as default, browser as fallback
- Sample Data: Built-in test images (bus.jpg, zidane.jpg) and video (people.mp4)
- COCO128: Dataset download link in Settings tab
- Bugfix: Fix frozen exe crash (
sys.stderr=Nonein PyInstaller)
- Smart Sampler: Balanced mode now distributes target count equally across classes with farthest-point sampling for spatial diversity
- Progress Bars: All tabs unified to explorer-style progress bar (20px height, % text overlay)
- Remapper: Converted to async with progress tracking
- Removed: Batch Inference tab (redundant with Viewer); Augmentation moved to Data section
- Explorer: Async loading with progress bar, double-click image preview with bbox overlay, multi-class checkbox filter, box operator filter (>=, =, <=), 5 view modes (file list, class distribution by box/image, box size distribution, aspect ratio distribution)
- Splitter: Strategy selection (random / stratified), custom ratio inputs, 0-ratio skip, progress bar
- Conf Optimizer: Per-class PR curve visualization, F1 display fix
- Embedder: Multi-image cosine similarity comparison
- Recursive folder support: Remapper, Merger, Sampler, Anomaly Detector, Quality Checker, Duplicate Detector
- Merger: dHash threshold description and input binding
- i18n: Korean translations for new UI elements
- Web UI overhaul with analysis tabs, class mapping, model downloads
- Benchmark system info export
- Rebrand to ssook






