Skip to content

Nobody-Zhang/DriveVigil

DriveVigil

Fatigue Driving Detection: Cloud-Edge Collaboration

DriveVigil — award-winning drowsy & distracted-driving detection spanning Huawei Cloud ModelArts ☁️ and an NVIDIA Jetson TX2 NX 🚗

License Python CI Tests Score Award

English · 中文版


Table of Contents


Overview

DriveVigil detects fatigue and distracted-driving behaviors — eyes closed, yawning, phone calls, and turning away — and is built as a cloud-edge collaboration:

  • ☁️ Cloud (Huawei Cloud ModelArts). A serverless inference service scores an uploaded dashcam video and returns the exact time periods of each dangerous behavior. A custom divide-and-conquer temporal localizer finds those periods without running detection on every frame — this is the core IP and scored 0.9741 in the preliminary round.
  • 🚗 Edge (NVIDIA Jetson TX2 NX). A real-time DeepStream + TensorRT pipeline runs detection on a live camera, streams the annotated video to a web UI, raises spoken warnings, and keeps itself healthy with a watchdog. An OTA loop ships freshly recorded clips back to the cloud, retrains the model, and hot-swaps the new weights onto the device — all without stopping inference.

Built for the Huawei Cloud track of the 18th "Challenge Cup" National College Students' Academic Science & Technology Competition, where it won Second Prize. This is competition code shared as a reference implementation — expect leaderboard-tuned heuristics.


System Architecture

flowchart TB
    client["🎥 Dashcam video"]

    subgraph cloud["☁️ Cloud — Huawei Cloud ModelArts"]
        direction TB
        svc["Inference service<br/>customize_service.py"]
        algo["yolo_run()<br/>divide-and-conquer + b_search"]
        obs[("OBS bucket<br/>clips + weights")]
        train["Training job<br/>free GPU flavor"]
        svc --> algo
        obs --> train --> obs
    end

    subgraph edge["🚗 Edge — NVIDIA Jetson TX2 NX"]
        direction TB
        cam["Camera /dev/video1"]
        ds["DeepStream + TensorRT<br/>YOLOv5 · 7 classes"]
        rtsp["RTSP server<br/>:8554/ds-test"]
        bridge["Flask MJPEG bridge<br/>:5000/video_feed"]
        voice["Voice assistant<br/>ASR → LLM → TTS"]
        wd["Watchdog<br/>camera · network · capture"]
        cam --> ds --> rtsp --> bridge
        ds --> voice
        wd -.watches.-> ds
    end

    ui["🖥️ Vue.js SPA"]

    client -->|"HTTP POST (multipart)"| svc
    svc -->|"JSON: periods + category"| client
    bridge -->|MJPEG| ui
    ds -->|"record clips"| obs
    obs -->|"download best.pt / best.onnx"| ds
    voice <-->|"Huawei SIS (WebSocket)"| cloud
Loading
Plane Runs on Responsibility
Cloud ModelArts (Python 3.7, PyTorch 1.8 / CUDA 10.2) Offline video scoring + model retraining
Edge Jetson TX2 NX (DeepStream 6.0, CUDA 10.2) Real-time detection, streaming, voice, self-healing
Frontend Browser Live view of the annotated stream + behavior alerts

How It Works

Cloud inference lifecycle

Each cloud variant is a deployable ModelArts "custom AI application" defined by a two-file contract: config.json (runtime + HTTP schema + pip deps) and customize_service.py (a thin PTServingBaseService). All real work is delegated to yolo_run().

sequenceDiagram
    autonumber
    participant C as Client
    participant S as customize_service.py
    participant Y as yolo_run()
    participant M as OpenVINO model

    C->>S: POST / (multipart: input_video)
    S->>S: _preprocess — write temp .mp4
    S->>Y: _inference — yolo_run(source)
    Y->>Y: load every frame into memory
    loop divide_and_conquer(0, N)
        Y->>M: infer a probed frame (memoized)
        M-->>Y: detections
        Y->>Y: YOLO_Status.determin → category
    end
    Y->>Y: b_search — refine each boundary
    Y-->>S: {drowsy:[{periods,category}], duration}
    S->>S: _postprocess — delete temp file
    S-->>C: JSON result
Loading

The response is authoritative in code:

{
  "result": {
    "drowsy": [
      { "periods": [start_ms, end_ms], "category": 1 }  // 0=normal 1=eyes-closed 2=yawn 3=calling 4=turning
    ],
    "duration": 6421                                      // inference time in ms
  }
}

The detector emits 7 YOLO classes in alphabetical index orderclose_eye=0, close_mouth=1, face=2, open_eye=3, open_mouth=4, phone=5, sideface=6 — and YOLO_Status.determin() maps a single frame's boxes to one of 5 behavior categories via geometric heuristics (it selects the driver among multiple faces, requires eyes/mouth to fall inside the face box, treats a phone near the face as "calling", etc.).

Divide-and-conquer localization

Rather than classifying all N frames, the localizer recursively bisects the timeline and only probes where behavior is found, so cost scales with the number of behavior transitions, not the video length.

flowchart TD
    A["divide_and_conquer(lo, hi)"] --> B{"hi − lo < 3s?"}
    B -->|yes| Z["prune — too short"]
    B -->|no| C["classify midpoint<br/>f(mid), memoized"]
    C --> D{"behavior ≠ normal?"}
    D -->|no| E["recurse on both halves"]
    D -->|yes| F["probe outward in 0.375s steps<br/>while the state persists"]
    F --> G{"span ≈ 2.6s+ ?"}
    G -->|no| E
    G -->|yes| H["record candidate"]
    H --> I["b_search — refine exact<br/>start/end (precision 0.05)"]
    I --> J["emit [start_ms, end_ms] + category"]
    H --> E
Loading
  • f(frame_idx) runs the model on one frame and is memoized — each frame is inferred at most once.
  • b_search() binary-searches each candidate's boundary to a precision set by iou_presice_b_search (0.05 = accuracy-first).
  • Weights load as OpenVINO IR (best.xml + .bin) through vendored YOLOv5's DetectMultiBackend. The pure classifier lives in cloud/preliminary/yolo/status.py and is covered by the test suite.

Cloud-edge OTA loop

The edge device closes the data loop: it collects real driving clips, retrains in the cloud, and pulls the improved model back — continuously.

flowchart LR
    A["Edge records clips"] --> B{"≥ 3 clips?<br/>(smart_record.log)"}
    B -->|no| A
    B -->|yes| C["Upload clips + weights → OBS<br/>(AK/SK signed)"]
    C --> D["POST ModelArts<br/>training job"]
    D --> E{"poll status"}
    E -->|running| E
    E -->|done| F["Download best.pt / best.onnx"]
    F --> G["Hot-swap edge model"]
    G --> H["Clear OBS dataset"]
    H --> A
Loading

Features

Feature Description
🎯 YOLOv5 + OpenVINO detection 7-class detector (eyes, mouth, face, side-face, phone) deployed on ModelArts
⏱️ Divide-and-conquer localization Finds exact behavior periods without scanning every frame
Jetson + DeepStream Real-time edge inference via TensorRT with RTSP streaming
🔁 OTA model update Upload → cloud-retrain → hot-swap weights, without stopping inference
🗣️ Voice interaction Huawei SIS ASR → LLM (LLaMA / Qwen) → Huawei SIS TTS
🩺 Watchdog daemon C++ threads guard the camera, network, and capture loop
🖥️ Web dashboard Vue SPA shows the live annotated stream and alerts

Quick Start

Prerequisites

The repo has three independent capabilities with very different requirements — pick what you need:

To run… You need
The test suite / core algorithm Python 3.8+ only (no GPU, cloud, or hardware)
Cloud video scoring A Huawei Cloud account with ModelArts + OBS; downloaded model assets
The edge real-time stack An NVIDIA Jetson TX2 NX (JetPack, DeepStream 6.0, CUDA 10.2) + a camera
OTA / voice Huawei Cloud OBS + SIS, and optionally a DashScope (Qwen) API key

⚠️ Large binaries are not in git. ~40 weights, TensorRT engines, OpenVINO IR, sample videos, and wheels live in Releases v1.0 (see docs/ASSETS.md). Cloud inference, the edge pipeline, and OTA do not run until download_assets.sh has fetched them.

Installation

git clone https://github.com/Nobody-Zhang/DriveVigil.git
cd DriveVigil

# One-shot: create .venv, install requirements.txt, copy .env.example -> .env
bash scripts/setup_env.sh
source .venv/bin/activate

# Pull the large model/video assets from GitHub Releases v1.0
bash scripts/download_assets.sh

# Fill in your credentials (see Configuration below)
$EDITOR .env

Path A — Try the core algorithm (no hardware)

The fastest way to see something working. The test suite exercises the per-frame classifier and geometry helpers with no GPU, cloud, or downloaded assets:

python -m venv .venv && source .venv/bin/activate
pip install -r requirements-dev.txt
pytest                       # 14 tests, runs in well under a second

To run the full localizer on a bundled sample video (needs the runtime deps and downloaded assets):

pip install -r requirements.txt
bash scripts/download_assets.sh
cd cloud/preliminary/yolo && python yolo_divide_and_conquer.py   # scores zipped.mp4

Path B — Deploy cloud inference (ModelArts)

The canonical model is cloud/preliminary/. Deploy it as a ModelArts custom AI application (the config.json + customize_service.py contract). Once a real-time service is running, call it over HTTP:

# Token auth (set HUAWEICLOUD_TOKEN and HUAWEICLOUD_CLOUDINFER_URL in .env)
curl -X POST "$HUAWEICLOUD_CLOUDINFER_URL" \
     -H "X-Auth-Token: $HUAWEICLOUD_TOKEN" \
     -F "input_video=@sample.mp4"
// → response
{ "result": { "drowsy": [ { "periods": [3200, 6800], "category": 2 } ], "duration": 5910 } }

Cloud code targets Python 3.7 (pytorch_1.8.0-cuda_10.2-py_3.7). cloud/semifinal/ is the same algorithm with relaxed thresholds; cloud/baseline/ is an older dlib EAR/MAR approach.

Path C — Run the edge stack (Jetson)

On a Jetson TX2 NX with DeepStream 6.0 installed:

# 1. Build the custom CUDA YOLO TensorRT output parser
CUDA_VER=10.2 make -C edge/deepstream/nvdsinfer_custom_impl_Yolo

# 2. Build & run the DeepStream app: camera -> TensorRT -> RTSP (:8554/ds-test)
cd edge/deepstream
CUDA_VER=10.2 make
./deepstream-app-test5-customized -c deepstream_app_config.txt

# 3. Bridge RTSP -> MJPEG for the browser, then open http://localhost:5000/video_feed
python app.py

Optional companion services:

# Watchdog daemon (needs OpenCV) — restarts/halts on camera or network failure
cd edge/watchdog && mkdir -p build && cd build && cmake .. && make && ./WatchDog

# OTA retraining loop (needs .env credentials)
cd edge/ota && python main.py

# Voice assistant: ASR -> LLM -> TTS
cd edge/voice/scripts && python recognize_generate4.py

Configuration

All secrets are read from a local .env (template: .env.example); .env is gitignored. Never commit credentials — see SECURITY.md.

Variable Purpose
HUAWEICLOUD_AK / HUAWEICLOUD_SK IAM access key / secret — sign OBS + ModelArts requests
HUAWEICLOUD_TOKEN IAM token for token-authenticated inference calls
HUAWEICLOUD_PROJECT_ID ModelArts project id
HUAWEICLOUD_IMA_ID ModelArts algorithm/image id used by the OTA training job
HUAWEICLOUD_REGION Region, e.g. cn-north-4
HUAWEICLOUD_CLOUDINFER_URL Deployed inference service endpoint
HUAWEICLOUD_MTCNN_URL MTCNN face-detection service endpoint
DASHSCOPE_API_KEY Qwen / 通义千问 key for the voice assistant's LLM fallback

Project Structure

DriveVigil/
├── cloud/                       # Huawei Cloud ModelArts inference apps
│   ├── baseline/                # PyTorch + dlib EAR/MAR baseline
│   ├── preliminary/             # ★ best score 0.9741 — divide-and-conquer
│   │   └── yolo/
│   │       ├── status.py        # pure per-frame classifier (unit-tested)
│   │       └── yolo_divide_and_conquer.py
│   └── semifinal/               # 0.8807 — same algorithm, relaxed thresholds
├── edge/                        # NVIDIA Jetson TX2 NX
│   ├── deepstream/              # DeepStream/TensorRT pipeline (see COMPETITION.md)
│   ├── ota/                     # cloud-edge OTA retraining loop
│   ├── cloud_finetune/          # vendored YOLOv5 training code
│   ├── voice/                   # voice assistant (ASR + LLM + TTS)
│   ├── watchdog/                # C++ monitoring daemon
│   ├── mtcnn/                   # MTCNN face detection + Euler angles
│   ├── apigw/                   # Huawei API Gateway SDK wrapper
│   └── frontend/                # prebuilt Vue.js SPA
├── tests/                       # pytest suite for the core classifier
├── scripts/                     # setup_env.sh, download_assets.sh
├── configs/ · utils/ · docs/

Development

Tooling targets Python 3.8 (cloud code deploys to ModelArts on Python 3.7). Work inside a virtual environment:

python -m venv .venv && source .venv/bin/activate
pip install -r requirements-dev.txt

Lint, format, and test — all gated in CI:

ruff check .            # lint    (ruff check --fix .  to auto-fix)
ruff format --check .   # format  (ruff format .       to apply)
pytest                  # unit tests for the core classifier + geometry helpers

See CONTRIBUTING.md for the full guide — vendored-tree boundaries, the Python 3.7/3.8 split, and the near-duplicate cloud variants.


Results

Stage Score Key Approach
Preliminary 0.9741 YOLOv5 + OpenVINO + divide-and-conquer (confidence 0.4)
Semi-final 0.8807 Same algorithm, relaxed thresholds
Final 🏆 Second Prize Full cloud-edge system demo

Citation

@misc{zhang2023fatigue,
    title   = {Fatigue Driving Detection: Cloud-Edge Collaboration},
    author  = {Gongbo Zhang and Shuming Guo and Luran Lv and Aolin Zhang and
               Xingyu Chen and Jintian Wu and Yufan Jia and Zheyu Zhou and
               Jiahao Zhang and Jinshen Zhang},
    year    = {2023},
    url     = {https://github.com/Nobody-Zhang/DriveVigil}
}

License

This project is released under the Apache 2.0 License.

Acknowledgments


Congratulations to Yongye Lai, Xuejia Chen et al. for winning the Grand Prize in the 19th Challenge Cup - Huawei Track!