Freq2Clean is a lightweight enhancement module trained on synthetic data that operates after a denoiser. In the Fourier domain, it fuses the magnitude of the temporally averaged video containing high spatial SNR with the denoiser's output, containing fast transients.
-
1-eda$-$ contains instructions to download datasets and visualizes them. -
2-sota$-$ denoise data with state-of-the-art denoisers. These denoised recordings serve as a baseline for Freq2Clean. -
3-freq2clean$-$ implements the parameter tuning as a torch module, trains and tests it against state-of-the-art baselines.-
A-frequency_fusion$-$ explores multiple frequency transforms to combine the best features of temporally averaged and denoised videos. Runs a grid search to find a good combination for the coefficients of the frequency combination.
-
-
4-segmentation$-$ proves that Freq2Clean leads to segmentation predictions that more closely match those obtained from the ground-truth frames. Also proves that Freq2Clean doesn't affect temporal dynamics.
Clone repository:
git clone https://github.com/MrPio/freq2clean
cd freq2cleanCreate and activate environment:
conda create -n freq2clean python=3.12
conda activate freq2clean
pip install -r requirements.txt- Download a dataset (use notebooks in Section 1 or add your own dataset into
DATASETS). - Denoise the recording using any denoiser (as in Section 2) and place the denoised
.tiffile inside the same folder wherex.tifandgt.tifare. - Run the inference with the following command, where:
--checkpointis the name of the subfolder oftrainings/where the checkpoint is located;--datasetis the key of the dataset inDATASETS;--denoiseris the name of the denoised.tiffile.
cd 3-freq2clean
python freq2clean_test.py \
--checkpoint dft1d \
--dataset synthetic \
--denoiser deepcad \
--batch_size 1- Edit
train_config.json:denoiser_variant: the suffix of the denoised.tiffile, if any. This is used to train on multiple denoised versions predicted by the same denoiser, but with different hyperparameter configurations;frequency_transform: choose betweendft1danddct3d;patch_t/patch_xy: choose the dimensions of the training patches. Use smallerpatch_tvalues fordct3dto limit the number of parameters.
- Run
cd 3-freq2clean; python freq2clean_train.py.
- The input video should be severely noisy, yielding a very low input SNR. Otherwise, there is little margin for improvement with SOTA denoisers.
- The recording should be still. The camera and the objects being recorded should both have slow spatial dynamics.
When operating under extremely low SNR conditions, which is common in in-vivo and miniature-microscope recordings, self-supervised denoisers can't capture fine details. This is due to the limited temporal context provided during training. This loss of spatial detail can negatively impact downstream analyses such as ROI segmentation, neuron extraction, and morphological assessment.
Temporal averaging reduces noise variance under a Poisson–Gaussian model, commonly assumed in 2PM. However, the spatial SNR gain comes at the cost of reduced temporal resolution which makes it unsuitable for applications where preserving neuronal activity patterns is critical.
Freq2Clean explicitly exploits the complementarity between temporally averaged recordings and denoiser outputs through a frequency-domain formulation. In doing so, it increases spatial SNR while preserving temporal resolution altogether, all without requiring the presence of a clean version of the noisy recording.
One DFT is computed along the temporal dimension for each pixel sequence in the video (a). Then, the magnitude spectra of the temporally averaged signal and the denoised signal (b) are fused by a convex combination of their Fourier magnitudes (c). The coefficients should favor the temporally averaged signal in the low-frequency band and the denoised signal in the high-frequency band (d).
The 3D DCT expresses a volumetric video patch as a linear combination of 3D DCT basis functions (a). Accordingly, a 3D DCT is computed for both the temporal-averaged and baseline videos and fusion is then performed by taking a convex combination of the resulting DCT coefficients. These fusion coefficients form a 3D mask (b).
When comparing frames side-by-side from two sample neurons, the Freq2Clean outputs are visibly closer to the ground truth (a). Furthermore, analyzing calcium transients from 80 isolated action potentials (b) reveals that Freq2Clean preserves baseline temporal dynamics. Freq2Clean leads to segmentation predictions that more closely match those obtained from the ground-truth frames (c).
Table 1: Performance on the NAOMi Synthetic Dataset
Freq2Clean consistently improves PSNR3D and SSIM3D when applied to state-of-the-art denoisers.
| Denoiser | Baseline PSNR3D ↑ | Baseline SSIM3D$ ↑ | Freq2Clean PSNR3D$ ↑ | Freq2Clean SSIM3D ↑ |
|---|---|---|---|---|
| BM3D | 13.52 | 0.207 | 13.74 | 0.280 |
| BM4D | 14.61 | 0.385 | 14.79 | 0.486 |
| Noise2Void | 16.35 | 0.267 | 17.21 | 0.288 |
| Noise2Noise | 18.64 | 0.499 | 19.13 | 0.594 |
| DeepCAD-RT | 27.94 | 0.760 | 30.04 | 0.880 |
| SRDTrans | 25.48 | 0.635 | 25.57 | 0.658 |
| DeepVIDv2 | 20.30 | 0.455 | 21.19 | 0.486 |
| TeD | 22.64 | 0.546 | 23.22 | 0.597 |
| FAST | 20.91 | 0.362 | 22.19 | 0.495 |
Table 2: Performance on Real Datasets
The enhancement provided by Freq2Clean generalizes to real datasets. Even though only pseudo-ground truths with tenfold SNR compared to the inputs are available, Freq2Clean still systematically improves PSNR3D and SSIM3D.
| Dataset | DeepCAD-RT PSNR3D ↑ | DeepCAD-RT SSIM3D ↑ | Freq2Clean PSNR3D ↑ | Freq2Clean SSIM3D ↑ |
|---|---|---|---|---|
| Mouse neuronal populations | 19.33 | 0.210 | 19.52 | 0.244 |
| Zebrafish brain | 16.84 | 0.259 | 16.87 | 0.289 |
| Mouse dend. spines (50 mW) | 13.38 | 0.090 | 13.46 | 0.092 |
| Mouse dend. spines (115 mW) | 13.40 | 0.149 | 13.43 | 0.155 |






