In this work, we present ArbItro, a multi-task CNN-RNN framework for football foul recognition on the SoccerNet-MVFouls benchmark, designed as a controlled study of what structurally classical architectures can recover under strong visual ambiguity and long-tail supervision.
The proposed system decouples offence detection, action classification, and disciplinary severity assessment into separate prediction heads, with the aim of disentangling infringement recognition from sanction estimation.
View Detailed Model Architecture & Parameters (56.1M Params)
Model Statistics:
- Total parameters: 56,159,663
- Trainable: 32,622,191
- Non-trainable: 23,537,472
| Layer (type) | Output Shape | Param # | Connected to |
|---|---|---|---|
video_input (InputLayer) |
(None, 4, 16, 224, 398, 3) |
0 | - |
td_cnn_clips (TimeDistributed) |
(None, 4, 16, 5, 11, 1536) |
54,336,736 | video_input[0][0] |
td_gap_clips (TimeDistributed) |
(None, 4, 16, 1536) |
0 | td_cnn_clips[0][0] |
dropout_video (Dropout) |
(None, 4, 16, 1536) |
0 | td_gap_clips[0][0] |
speed_input (InputLayer) |
(None, 1) |
0 | - |
td_lstm_per_clip (TimeDistributed) |
(None, 4, 256) |
1,704,960 | dropout_video[0][0] |
clip_mask (InputLayer) |
(None, 4) |
0 | - |
speed_embed (Dense) |
(None, 32) |
64 | speed_input[0][0] |
clip_fusion (Lambda) |
(None, 256) |
0 | td_lstm_per_clip[0][0], clip_mask[0][0] |
dropout_speed (Dropout) |
(None, 32) |
0 | speed_embed[0][0] |
fusion (Concatenate) |
(None, 288) |
0 | clip_fusion[0][0], dropout_speed[0][0] |
dense_shared (Dense) |
(None, 256) |
73,984 | fusion[0][0] |
ln_shared (LayerNormalization) |
(None, 256) |
512 | dense_shared[0][0] |
dropout_shared (Dropout) |
(None, 256) |
0 | ln_shared[0][0] |
act_dense (Dense) |
(None, 64) |
16,448 | dropout_shared[0][0] |
off_dense (Dense) |
(None, 32) |
8,224 | dropout_shared[0][0] |
sev_dense (Dense) |
(None, 64) |
16,448 | dropout_shared[0][0] |
act_dropout (Dropout) |
(None, 64) |
0 | act_dense[0][0] |
off_dropout (Dropout) |
(None, 32) |
0 | off_dense[0][0] |
sev_dropout (Dropout) |
(None, 64) |
0 | sev_dense[0][0] |
aux_bodypart (Dense) |
(None, 3) |
771 | dropout_shared[0][0] |
aux_contact (Dense) |
(None, 1) |
257 | dropout_shared[0][0] |
aux_handball (Dense) |
(None, 1) |
257 | dropout_shared[0][0] |
aux_touch_ball (Dense) |
(None, 1) |
257 | dropout_shared[0][0] |
aux_try_play (Dense) |
(None, 1) |
257 | dropout_shared[0][0] |
head_action (Dense) |
(None, 4) |
260 | act_dropout[0][0] |
head_offence (Dense) |
(None, 1) |
33 | off_dropout[0][0] |
head_severity (Dense) |
(None, 3) |
195 | sev_dropout[0][0] |
The ArbItro project is trained and evaluated using the official SoccerNet Challenge 2025 - Multi-View Fouls Recognition dataset.
This dataset provides a realistic and challenging benchmark for identifying fouls from multiple synchronized camera angles. As is typical with sporting event data, there is a significant class imbalance across the various labels, which is addressed through specialized data augmentation and balancing techniques.
- Class Distribution: See the figure to the right for detailed metrics on Severity, Offense Type, Action, and Body Part involved.
- Clip Duration: Approximately 5 seconds per clip, centered precisely on the moment of the action.
- Views per Action: Multiple synchronized camera angles are available for each event, providing a comprehensive view for the model.
To evaluate the effectiveness of ArbItro as a Video Assistant Referee System (VARS) support tool, we measured performance across all four multi-task output heads. Given the standard class imbalance in football foul data, we focus on Balanced Accuracy and Recall per head.
The heatmap below provides a detailed breakdown of the per-class recall for Pipeline 1, Pipeline 2, and our Ensemble Model across the three multi-task heads:
All training and evaluation notebooks are located under model/src/ and are fully
compatible with both local execution and Google Colab.
Clone the repository
git clone https://github.com/gallocarmine/ArbItro.git
cd ArbItroPrerequisites (local only)
pip install -r requirements.txtPipeline 1
# Training
model/src/pipeline1/arbitro_train.ipynb
# Evaluation
model/src/pipeline1/arbitro_test.ipynbPipeline 2 follows the identical structure:
model/src/pipeline2/arbitro_train.ipynb
model/src/pipeline2/arbitro_test.ipynbBoth pipelines share the same notebook conventions.
data_loader.pyandmodel.pyin each folder define the data pipeline and architecture respectively. Trained weights are saved toArbItro_Training/models/.
Ensemble Evaluation
model/src/ensemble_test.ipynbRequires both
pipeline1.kerasandpipeline2.kerasto be present inArbItro_Training/models/.
Mount your Google Drive and update the dataset path in
data_loader.pyaccordingly, or store the dataset locally and update the path directly.
Prerequisites
- Python 3.12
- Node.js β₯ 18
1. Clone the repository
git clone https://github.com/tuo-username/ArbItro.git
cd ArbItro2. Install Python dependencies
pip install -r requirements.txt3. Install Electron dependencies
cd app
npm install4. Start the inference server
cd app/server
source ../../.venv/bin/activate
python3 server.py5. Launch the desktop app (in a separate terminal)
cd app
npm startThe app connects to the Flask server at
http://127.0.0.1:5000. Make sure the server is running before clicking Analyze Action.
The following animated sequences provide a comprehensive overview of the ArbItro system, from theoretical operation to real-time VARS interface usage.
The following animated sequences demonstrate the ArbItro workflow: a practical walkthrough of our VARS interface analyzing that specific event.
The VARS interface allows the human operator to manually select and load the video feeds for a specific foul event. The user is responsible for uploading the synchronized camera angles required by the multi-view Deep Learning model for accurate evaluation.
Before triggering the inference phase, the user can set the desired playback speed for manual visual review. Once the analysis is initiated, the system processes the multi-stream video data through the selected pipeline and subsequently outputs the classification results across all four heads.
ArbItro/
βββ app/
β βββ client/
β β βββ static/
β β β βββ renderer.js
β β β βββ style.css
β β βββ index.html
β β
β βββ server/
β β βββ server.py
β βββ main.js
β βββ package.json
β βββ package-lock.json
β
βββ asset/
β
βββ model/
βββ src/
βββ pipeline1/
β βββ arbitro_test.ipynb
β βββ arbitro_train.ipynb
β βββ data_loader.py
β βββ model.py
β
βββ pipeline2/
β βββ arbitro_test.ipynb
β βββ arbitro_train.ipynb
β βββ data_loader.py
β βββ model.py
β
βββ ensemble_test.ipynb
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Copyright (c) 2026 Carmine Gallo, Andrea De Simone, Matteo Trinchese, Salvatore Frontoso, Salvatore Pio Ruggiero
For full legal details, please refer to the LICENSE file included in this repository or visit the Official Creative Commons page.





