End-to-end data pipeline for international volleyball data from FIVB VIS
Quick Start • Features • Architecture • API Docs • Dashboard
SetStream ingests, processes, and analyzes international volleyball data from FIVB VIS (Volleyball Information System). Built entirely in R with modern data engineering practices, featuring automated pipelines, Elo ratings, upset detection, and interactive analytics.
setstream/
├── R/ # Core pipeline modules
├── sql/marts/ # SQL mart definitions
├── api/ # REST API (Plumber)
├── dashboard/ # Shiny dashboard
├── scripts/ # Execution scripts
├── tests/ # Test suite
├── data/
│ ├── lake/ # Parquet data lake
│ ├── warehouse/ # DuckDB warehouse
│ └── state/ # Pipeline state
├── _targets.R # Pipeline orchestration
└── config.yml # Configuration
# Build and start all services
docker-compose up -d
# Run the pipeline
docker-compose exec pipeline make run
# View logs
docker-compose logs -f
# Access services:
# - API: http://localhost:8000
# - Dashboard: http://localhost:3838# 1. Bootstrap environment (first time only)
make bootstrap
# 2. Run the pipeline
make run
# 3. Start API (separate terminal)
make api
# 4. Launch dashboard (separate terminal)
make dashboardSetStream follows a layered ELT architecture with clear separation of concerns:
📡 FIVB VIS API
↓
🔄 [Extract Layer] ← Rate limiting, retries, caching
↓
💾 [Data Lake] ← Parquet, partitioned (Season/Year)
↓
🗄️ [DuckDB Warehouse] ← Staging tables, upserts
↓
✅ [Quality Checks] ← Schema, uniqueness, referential integrity
↓
📊 [Marts] ← Team form, Elo ratings, upsets, rankings
↓
🚀 [API + Dashboard] ← Analytics interface
- Language: R (primary), SQL
- Data Source:
fivbvispackage (openvolley) - Storage: Parquet (lake) + DuckDB (warehouse)
- Orchestration:
targets(functional DAG) - Quality:
pointblank(validation framework) - API:
plumber(REST) - Dashboard: Shiny
- Testing:
testthat - Logging:
logger - Containers: Docker + Docker Compose
- CI/CD: GitHub Actions
✅ Fully Automated: No manual data entry
✅ Incremental Loads: Only fetch new data
✅ Idempotent: Safe to rerun
✅ Data Quality: Comprehensive validation
✅ Elo Ratings: Team strength evolution
✅ Upset Detection: Surprising match outcomes
✅ Production-Ready: Logging, retries, error handling
stg_tournaments- Tournament metadatastg_matches- Match resultsstg_match_details- Detailed match statisticsstg_tournament_rankings- Final rankings
mart_team_form- Recent W/L, streaksmart_team_elo_history- Elo rating evolutionmart_upsets- Underdog victoriesmart_tournament_rankings- Tournament results
GET /health- Service statusGET /teams/top?limit=20- Top teams by EloGET /teams/{team}/elo- Team Elo historyGET /tournaments/recent- Recent tournamentsGET /upsets/recent?days=30- Recent upsets
- Top Teams - Elo leaderboard
- Team Detail - Team profile & history
- Upsets - Surprising results
- Pipeline - Monitoring & stats
Edit config.yml for:
- Rolling window days (default: 365)
- Rate limiting (default: 1 req/sec)
- API port (default: 8000)
- Dashboard port (default: 3838)
- Rate limiting (1 req/sec default)
- Local caching (avoid refetches)
- Incremental loads (minimal requests)
- Poocker Deployment
# Quick deploy (uses deploy.sh)
./deploy.sh
# Or manually:
docker-compose -f docker-compose.yml up -d
# Check service health
docker-compose ps
# Scale services (if needed)
docker-compose up -d --scale api=3
# Stop services
docker-compose downThe project includes a GitHub Actions workflow (.github/workflows/ci-cd.yml) that:
✅ Runs tests on multiple R versions
✅ Builds Docker images
✅ Performs security scans
✅ Pushes images to GitHub Container Registry
✅ Deploys to production (configurable)
Required GitHub Secrets:
GITHUB_TOKEN(automatically provided)
production- Full pipeline runnerapi- REST API servicedashboard- Shiny dashboard
# Run tests
make test
# Clean data (reset)
make clean
# View pipeline graph
Rscript -e "targets::tar_visnetwork()"
# Docker development
docker-compose build
docker-compose run --rm pipeline make test
# View pipeline graph
Rscript -e "targets::tar_visnetwork()"Issue: fivbvis functions fail
Solution: Check internet connection, verify API availability
Issue: DuckDB locked
Solution: Close all R sessions, delete .duckdb.wal file
Issue: Out of memory
Solution: Reduce rolling_window_days in config.yml
بدون
- openvolley/fivbvis - Data access
- FIVB VIS - Data source
Built with ❤️ for volleyball analytics
