Precipitation Nowcasting with Deep Learning

A probabilistic deep learning project for short-term precipitation forecasting using ConvNeXt U-Net architecture with PyTorch Lightning. The project focuses on predicting precipitation probability distributions using radar data (RADOLAN) and digital elevation models (DEM).

Results

The ConvNeXt U-Net model demonstrates consistent improvements over the optical flow extrapolation baseline across different precipitation intensities. This baseline is the method currently used by many operational precipitation nowcasting services. These results were achieved after training for just one epoch on a 2-year dataset (2019-2020) with a daily-based train (70%) - validation (15%) - test (15%) split. The RMSE Skill Score analysis on a subset of validation data shows:

Consistent outperformance: The model achieves positive skill scores of ~15% indicating superior performance compared to the baseline
Peak performance: Maximum skill improvement (~18%) occurs around moderate precipitation intensities (5 mm/h sample mean)
Robust across intensities: The model maintains skill advantages from light drizzle (0.1 mm/h) to heavy precipitation (10 mm/h). Note that these intensities refer to the sample-wise means.

Sample Predictions

The above samples have been randomly drawn from the validation data and demonstrate the model's ability to predict complex precipitation patterns. Each row shows: input radar sequence (4 time steps), ground truth target (15-min forecast), baseline prediction (optical flow extrapolation), model prediction, and model certainty.

Note: These results demonstrate the probabilistic model's effectiveness after 5 epochs of training. This analysis focuses on RMSE-based performance metrics derived from the most probable rain intensity of the predicted probability distributions. Comprehensive evaluation including FSS and other meteorological skill scores can be generated using the evaluation of this pipeline.

Features

End-to-End Deep Learning Pipeline: Complete workflow with dataset creation, model training and evaluation.
ConvNeXt U-Net Architecture: Modern CNN implementation
Multi-Modal Input Processing: Combines temporal radar data with digital elevation models, extensible for additional data sources
MLOps Integration: PyTorch Lightning framework with WandB tracking, Multi-GPU distributed training, and model versioning
Multi-Stage Development Support: Debug, local, and cluster execution modes spanning from rapid prototyping on a desktop machine to large-scale training on a compute cluster

Project Structure

├── configs/                    # Configuration files
│   ├── cluster_default.yml    # Default cluster configuration
│   ├── local_config.yml       # Local machine overrides
│   └── debug_config.yml       # Debug mode settings
├── data_pre_processing/        # Data loading and preprocessing
├── model/                      # Neural network models and training logic
├── evaluation/                 # Evaluation metrics and pipelines
├── plotting/                   # Visualization and plotting utilities
├── helper/                     # Utility functions
├── training_utils/             # Training and preprocessing utilities
├── requirements/               # Environment setup files
└── train_lightning.py          # Main training script

Recommended Directory Structure

The project expects a specific directory structure for optimal organization and deployment:

├── results/                    # Training outputs and model artifacts
├── scripts/                    # Deployed code copies (managed by deployment scripts)
└── weather_data/              # Input datasets
    ├── radolan/               # RADOLAN precipitation data
    ├── static/                # Static data (DEM files)
    └── baselines/             # Baseline comparison data

Directory Details:

results/: Contains training outputs as specified by s_save_dir in configuration files. Each training run creates a subdirectory with model checkpoints, logs, plots, and evaluation metrics.
scripts/: Managed by deployment scripts in push_pull_from_cluster/. When deploying code to remote machines, scripts are copied here with version control (e.g., copy_1_experiment_name/).
weather_data/: Houses all input datasets referenced by configuration paths:
- RADOLAN data (s_folder_path)
- Digital Elevation Models (s_dem_path)
- Baseline predictions (s_baseline_path)

This structure aligns with the configuration files and deployment scripts, ensuring consistent data access across different execution environments.

Environment Setup

Prerequisites

Python 3.10
Conda or Mamba package manager
CUDA-compatible GPU (recommended)

Installation

Create and activate the conda environment:

# Create environment with Python 3.10
mamba create -n first_CNN_on_Radolan python=3.10 -y

# Activate the environment
mamba activate first_CNN_on_Radolan

Configure conda channels:

# Configure to use conda-forge with strict priority
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict

Install dependencies from conda-forge:

mamba install -y pytorch-lightning xarray zarr numpy matplotlib pandas scipy dask pyarrow psutil h5py pyyaml einops pysteps wandb hurry.filesize

Install PyTorch with CUDA support: Choose the cuda version that is supported by your hardware, e.g.:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Adding Additional Packages

When installing additional packages, always use mamba to preserve compatibility:

mamba install -y package-name

Usage

The project supports three execution modes, each with its own configuration:

1. Debug Mode

For quick testing and development with minimal data and resources.

python train_lightning.py --mode debug

Debug mode characteristics:

Uses very small datasets (5-minute time spans)
Small batch sizes (batch_size=4)
Minimal epochs (max_epochs=1)
0 data loader workers (for debugging)
Small input dimensions
Fast execution for testing code changes

2. Local Mode

For running on local machines with moderate datasets.

python train_lightning.py --mode local

Local mode characteristics:

Uses larger datasets (10-day time spans)
Moderate batch sizes (batch_size=64)
Single GPU usage
Well suited for memory and computational profiling of new features
More realistic training scenarios allow quick prototyping and hyperparameter exploration
Suitable for development and small experiments

3. Cluster Mode

For full-scale training on computing clusters.

python train_lightning.py --mode cluster

Cluster mode characteristics:

Uses full datasets (2-year time spans, ~1TB size)
Large batch sizes (batch_size=128)
Multi-GPU support (4 GPUs)
Extensive evaluation metrics
Production-level training

Remote Development & Cluster Management

The project includes utility scripts in push_pull_from_cluster/ for deploying code and fetching results from remote machines.

Shell Integration

Add these aliases to your .zshrc (or .bashrc) for convenient access:

# Code deployment
alias deploy='/path/to/your/project/push_pull_from_cluster/deploy.sh'

# Result fetching
copy_runs() {
    /path/to/your/project/push_pull_from_cluster/copy_complete_runs.sh "$@"
}

Usage:

deploy                                                    # Deploy code to cluster
copy_runs Run_20250101-120000_experiment_name            # Fetch single run
copy_runs run1 run2 run3                                 # Fetch multiple runs

Note: Adjust paths in scripts and shell config to match your setup.

Configuration

Each mode loads settings from YAML configuration files in the configs/ directory:

cluster_default.yml: Base configuration with all default settings
local_config.yml: Overrides for local machine execution
debug_config.yml: Overrides for debugging and testing

Key Configuration Options

Data Settings:

s_folder_path: "/path/to/radolan/data"
s_data_file_name: "RV_recalc_rechunked.zarr"
s_dem_path: "/path/to/dem/data.zarr"
s_crop_data_time_span: ["2019-01-01T00:00", "2020-12-01T00:00"]

Model Settings:

s_convnext: true  # Use ConvNeXt architecture
s_input_height_width: 256
s_target_height_width: 32
s_num_input_time_steps: 4
s_num_lead_time_steps: 3
s_num_bins_crossentropy: 32  # Number of precipitation probability bins

Training Settings:

s_max_epochs: 100
s_batch_size: 128
s_learning_rate: 0.001
s_lr_schedule: true

Evaluation Settings:

s_fss: true  # Enable FSS evaluation
s_fss_scales: [5, 11]
s_fss_thresholds: [0.78, 1.51, 3.53, 7.07, 12.14]
s_dlbd_eval: true  # Enable DLBD evaluation

Data Requirements

The project expects data in zarr format:

Radar precipitation data: Rechunked zarr files with precipitation measurements (Our 2-year RADOLAN dataset is ~1TB uncompressed)
Digital Elevation Model (DEM): Static elevation data
Baseline data: Optical flow extrapolation or other baselines (required for evaluation) (Our 2-year baseline data is ~13TB uncompressed)

Data should be organized with the following structure:

Time dimension for temporal sequences
Spatial dimensions (y, x) for geographic coordinates
Proper coordinate reference system metadata

Probabilistic Forecasting Approach

This system implements a probabilistic precipitation nowcasting model that predicts probability distributions for each pixel rather than deterministic point forecasts.

Input-Output Architecture

Input Configuration:

Input Frames: 4 consecutive radar precipitation frames (s_num_input_time_steps: 4)
Target: Single precipitation frame at forecast time
Additional Data: Digital Elevation Model (DEM) as static input

Probabilistic Output: The network predicts a probability distribution for each pixel in the form of categorical bins, where each channel represents a different precipitation intensity range. Instead of predicting a single precipitation value, the model outputs probabilities across s_num_bins_crossentropy precipitation bins (default: 32 bins) across a log-normalized number space.

Training Strategy

Target Processing:

Deterministic Targets: Ground truth precipitation values are converted to one-hot encoded vectors
Bin Assignment: Each precipitation value is assigned to its corresponding intensity bin
Cross-Entropy Loss: Default pixel-wise cross-entropy loss trains the model to predict a probability distribution

Alternative Dynamic Locally Binned Density (DLBD) Loss: For spatial loss functions, the one-hot target can be pre-processed with channel-wise Gaussian smoothing before applying cross-entropy loss:

# Enable DLBD (Dynamic Locally Binned Density)
s_gaussian_smoothing_target: true
s_sigma_target_smoothing: 1.0  # Gaussian smoothing sigma

This DLBD approach creates spatially smooth probability distributions to account for displacement errors in precipitation forecasting.

Deep Learning Pipeline

The precipitation nowcasting pipeline consists of four main components that can be executed independently. This section provides a comprehensive overview of each step in the workflow.

By default, the pipeline subsequently runs the Dataset Creation Pipeline, Training and Evaluation.

Dataset Creation Pipeline

The dataset creation pipeline transforms raw RADOLAN precipitation data into training-ready samples through several automated steps:

Data Preprocessing / Dataset Creation Pipeline: Uses efficient zarr chunking and dask for large datasets. A pipeline is implemented that creates training, validation and test data based on filter options, sample patching properties allowing for quick dataset prototyping. This includes the calculation of sample statistics that can be used for downstream importance sampling.

Caching: The preprocessed training, validation and test datasets are cached to avoid reprocessing. Consistent seeding ensures reproducible separation between training, validation and test data.

Detailed Processing Steps

The preprocess_data() function in data_pre_processing_pipeline.py performs the following operations:

Data Patching: Raw precipitation data is divided into spatial patches of target size (s_target_height_width). Each patch represents a potential training sample with its corresponding spatial coordinates.
Filtering: Patches are filtered based on precipitation intensity thresholds:
- s_filter_threshold_mm_rain_each_pixel: Minimum precipitation threshold per pixel
- s_filter_threshold_percentage_pixels: Minimum percentage of pixels that must exceed the threshold
Temporal Splitting: Data is split daily according to configured ratios (default: 70% train, 15% validation, 15% test) using s_ratio_train_val_test setting with consistent seeding for reproducibility.
Statistics Calculation: Normalization statistics (mean and standard deviation) are calculated on training data only to prevent data leakage, supporting log-normal transformation for precipitation data.
Binning Setup: Creates linspace binning for probabilistic categorical predictions with s_num_bins_crossentropy bins up to s_linspace_binning_cut_off_unnormalized mm/h, enabling the model to output probability distributions over precipitation intensity ranges.
Coordinate Conversion: Spatio-temporal indices are converted to sample coordinates that define slices for data loading. These coordinates specify exactly which patches can be loaded by the data loader.
Oversampling Weights: When enabled (s_oversampling_enabled: true), calculates pixel-wise importance sampling weights based on precipitation intensity distributions to address class imbalance.

Caching Configuration: Enable preprocessing cache by setting s_force_data_preprocessing: false in your configuration. The cache filename automatically incorporates key parameters to ensure consistency.

Model Training

The training pipeline implements a sophisticated data loading and augmentation system that supports multi-modal inputs and distributed training.

Data Loading Architecture

The dataset operates in mode='train' and returns spatio-temporal chunks spanning from the first input frame to the target frame. The core data loading is handled by the get_sample_from_coords() method:

dynamic_samples_dict, static_samples_dict = self.get_sample_from_coords(
    sample_coord,
    load_metadata = False,
    test_metadata_alignment = False
)

Multi-Source Data Integration

New data sources can be easily added through dictionaries that maintain parallel structures:

self.dynamic_data_dict = {
    'radolan': radolan_data
}

# Variable names in xr.Dataset
self.dynamic_variable_name_dict = {
    'radolan': s_data_variable_name
}

# Normalization statistics dicts
self.dynamic_statistics_dict = {
    'radolan': radolan_statistics_dict
}

This architecture allows seamless integration of additional meteorological variables (satellite data, NWP outputs, etc.) by simply adding entries to these dictionaries.

Augmentations

Data augmentation is applied to both dynamic and static inputs during training:

dynamic_samples_dict, static_samples_dict = self.augment(dynamic_samples_dict, static_samples_dict)

Current augmentations include:

Random Cropping: Applied to input data with padding (s_input_padding) to achieve target input size
Spatial Augmentations: Consistent transformations across all input modalities
Extensible Framework: Additional augmentations can be easily integrated

The random crop augmentations showed to significantly reduce overfitting for the rare high intensity samples when oversampling was active.

Oversampling and Training Configuration

Oversampling Control: Enable/disable through s_oversampling_enabled, with separate controls for training (s_oversample_train) and validation (s_oversample_validation) datasets.

Distributed Training: Uses PyTorch Lightning with 'ddp' strategy for multi-GPU training. In 'cluster' mode, automatically utilizes multiple GPUs (recommended: 4 A100 GPUs for at least one day of training).

Training Architecture: ConvNeXt U-Net processes concatenated inputs to output probabilistic predictions:

Dynamic Inputs: 4 temporal frames of normalized RADOLAN data
Static Inputs: Digital Elevation Model (DEM) normalized by standard deviation
Output: Probability distributions over precipitation intensity bins for each pixel (categorical predictions with s_num_bins_crossentropy channels)

Evaluation

The evaluation pipeline provides comprehensive model assessment through multiple metrics and can be executed independently from any checkpoint.

Evaluation Configuration

To run evaluation on existing checkpoints, configure:

s_plotting_only: true
s_plot_sim_name: "Run_20250509-182459_ID_15005832_years_xentropy"

Evaluation Process

Model Loading: Evaluation runs as a forward pass where the baseline is directly loaded for comparison against model predictions.
Baseline Mode: The dataset operates in mode='baseline' with no augmentation applied
Comprehensive Metrics: Multiple evaluation metrics are calculated:
- RMSE: Root Mean Square Error for continuous precipitation values
- FSS: Fractions Skill Score at multiple scales (s_fss_scales) and thresholds (s_fss_thresholds)
- DLBD: Distance-based evaluation with Channel-Wise Gaussian smoothing at various sigma values (s_sigmas_dlbd_eval)
Baseline Comparison: Direct comparison against optical flow extrapolation baseline across all metrics.

Evaluation Outputs

Results are automatically saved to:

CSV Files: Sample-level metrics for detailed analysis for both model predictions, baselines and direct comparisons between the two
NetCDF Files: FSS evaluations with compression for efficient storage

Prediction

The prediction mode generates operational forecasts and represents the deployment phase of the pipeline. Predictions can be generated using the evaluation/checkpoint_to_prediction.py module.

Prediction Process

Forecast Generation: The dataset operates in mode='predict' to create actual forecasts saved in zarr format for efficient handling of large spatiotemporal datasets.

Output Characteristics:

Format: Zarr arrays with full spatial coverage
File Size: Depending on the time period that the predictions are created for, this creates large files due to high-resolution spatiotemporal predictions.

Data Flow: Predictions are generated for unfiltered spatial domains, providing complete coverage for operational forecasting applications.

Note: Due to the large files produced, the inference speed is limited by the disk writing speed. The prediction component is still in development phase.

Model Architecture

The project implements a ConvNeXt U-Net architecture in PyTorch.

ConvNeXt U-Net Overview

The model follows a encoder-decoder structure with skip connections, processing features at multiple spatial resolutions. The architecture is highly configurable, allowing adaptation to different computational constraints and performance requirements.

ConvNeXt Blocks: Modern convolutional blocks featuring depthwise separable convolutions (7×7 kernel), channel-wise layer normalization, inverted bottleneck design, and residual connections.

U-Net Structure: Progressive downsampling/upsampling with skip connections applied before downsampling operations (following ConvNeXt paper), plus an additional input-to-output skip connection.

Model Configuration

The ConvNeXt U-Net architecture can be customized through key parameters. In its default configuration, which has been used to generate the results above, the model has 2.7 Million parameters.

Channel Configuration (c_list):

c_list=[32, 64, 128, 256]  # Default: channels at each resolution level

Length determines U-Net depth (number of downsampling/upsampling stages)

Spatial Downsampling (spatial_factor_list):

spatial_factor_list=[4, 2, 2]  # Default: downsampling factors per stage

Controls spatial resolution reduction at each encoder stage
Length must equal len(c_list) - 1
Initial aggressive downsampling (4) followed by moderate reduction (2)

Number of ConvNeXt blocks for each downscaling / upscaling operation(num_blocks_list):

num_blocks_list=[1, 2, 4]  # Default: blocks per resolution level

Input/Output Configuration:

Input Channels (c_in): 5 channels (4 RADOLAN frames + 1 DEM)
Output Channels (c_target): Equals s_num_bins_crossentropy (default: 32 precipitation probability bins)
Output Resolution: Determined by s_target_height_width (default: 32×32)

Model Scaling Guidelines

Memory Constraints: Reduce c_list values and num_blocks_list depth
Accuracy Requirements: Increase model depth and channel counts
New Input Modalities: Update c_in to reflect additional channels

The architecture's modularity enables systematic exploration of design trade-offs.

Output Structure

Results are saved in the specified save directory with the following structure:

results/
└── Run_YYYYMMDD-HHMMSS_[suffix]/
    ├── model/              # Model checkpoints
    ├── logs/               # Training logs and metrics
    ├── plots/              # Generated plots
    ├── data/               # Preprocessed data cache
    ├── code/               # Code snapshot
    └── predictions/        # Model predictions (zarr)

Troubleshooting

CUDA Issues:

Ensure CUDA_VISIBLE_DEVICES is set correctly
Check PyTorch CUDA compatibility with your system

Memory Issues:

Reduce batch size
Decrease number of data loader workers
Use smaller crop time spans for debugging
Disable oversampling

Data Loading:

Verify zarr file paths and structure
Check coordinate reference systems match
Ensure proper chunking of zarr files - in practise one chunk per time step provided the best trade off between performance and flexibility with respect to batch size. The GPU should be the performance limiting factor - not the data loading.

Prediction Quality and Training Issues

The predicted frame should be spatially significantly smaller than the input frames to avoid issues with boundary effects
Increase the number of model parameters
Decrease spatial size of the predicted patch
From experience, the largest effect on the prediction quality is achieved by the quality of the training data and the way it is presented to the Model: Play around with the Hyperparameters of the Dataset Creation Pipeline - especially filtering criteria, normalisation (for input as well as output binning) as well as oversampling settings.
In case of overfitting: More training data? Do too harsh oversampling criteria lead to the network seeing rare events too often? Are meaningful data augmentations in place (especially when oversampling is active)?
Keep in mind that precipitation roughly follows a long tail distribution. Even in large amounts of data only very few heavy rain events will be included.

Funding Support

This project was suppored by funding from the Cyber Valley in Tübingen, project number CyVy-RF-2020-15.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
configs		configs
data_pre_processing		data_pre_processing
evaluation		evaluation
helper		helper
misc/data_preparation		misc/data_preparation
model		model
plot_evaluation		plot_evaluation
plotting		plotting
push_pull_from_cluster		push_pull_from_cluster
readme_plots		readme_plots
tests		tests
training_utils		training_utils
Readme.md		Readme.md
__init__.py		__init__.py
_set_wdir_.py		_set_wdir_.py
sbatch_train_lightning_a100_4_gpu.sh		sbatch_train_lightning_a100_4_gpu.sh
train_lightning.py		train_lightning.py

Folders and files

Latest commit

History

Repository files navigation

Precipitation Nowcasting with Deep Learning

Results

Sample Predictions

Features

Project Structure

Recommended Directory Structure

Environment Setup

Prerequisites

Installation

Adding Additional Packages

Usage

1. Debug Mode

2. Local Mode

3. Cluster Mode

Remote Development & Cluster Management

Shell Integration

Configuration

Key Configuration Options

Data Requirements

Probabilistic Forecasting Approach

Input-Output Architecture

Training Strategy

Deep Learning Pipeline

Dataset Creation Pipeline

Detailed Processing Steps

Model Training

Data Loading Architecture

Multi-Source Data Integration

Augmentations

Oversampling and Training Configuration

Evaluation

Evaluation Configuration

Evaluation Process

Evaluation Outputs

Prediction

Prediction Process

Model Architecture

ConvNeXt U-Net Overview

Model Configuration

Model Scaling Guidelines

Output Structure

Troubleshooting

Funding Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages