Problem
When running combi_plot() on large SDC analyses (e.g., 6,500+ rows with large lag ranges like ±365), the matplotlib rendering can take 5-10+ minutes or even hang indefinitely. This is because the heatmap matrix becomes enormous (potentially millions of cells).
Proposed Solutions
1. Pillow-based heatmap rendering (fastest)
For matrices exceeding a threshold, bypass matplotlib's pcolormesh and generate the heatmap image directly with Pillow:
from PIL import Image
import numpy as np
def _fast_heatmap(matrix, cmap='RdBu_r', vmin=-1, vmax=1):
"""Generate heatmap image using Pillow instead of matplotlib."""
normalized = np.clip((matrix - vmin) / (vmax - vmin), 0, 1)
cm = plt.cm.get_cmap(cmap)
colored = (cm(normalized)[:, :, :3] * 255).astype(np.uint8)
return Image.fromarray(colored)
This could reduce rendering from 5+ minutes to ~5-10 seconds.
2. rasterized=True for matplotlib
At minimum, use rasterized=True in pcolormesh calls to speed up savefig().
API Suggestions
# Option A: Add fast parameter
analysis.combi_plot(fast=True) # Uses optimized rendering
# Option B: Auto-detect based on matrix size
analysis.combi_plot() # Automatically uses fast mode when matrix > threshold
# Option C: Expose matrix size for client decisions
if analysis.heatmap_size > 250_000: # 500x500
# Use alternative visualization
Helper property suggestion
We could add a property to the SDCAnalysis class so that we easily get the heatmap matrix size:
@property
def heatmap_size(self) -> int:
"""Number of cells in the correlation heatmap matrix."""
n_fragments_1 = len(self.ts1) - self.fragment_size + 1
n_fragments_2 = len(self.ts2) - self.fragment_size + 1
return n_fragments_1 * n_fragments_2
Related
sdcpy-app/#4
Problem
When running
combi_plot()on large SDC analyses (e.g., 6,500+ rows with large lag ranges like ±365), the matplotlib rendering can take 5-10+ minutes or even hang indefinitely. This is because the heatmap matrix becomes enormous (potentially millions of cells).Proposed Solutions
1. Pillow-based heatmap rendering (fastest)
For matrices exceeding a threshold, bypass matplotlib's
pcolormeshand generate the heatmap image directly with Pillow:This could reduce rendering from 5+ minutes to ~5-10 seconds.
2.
rasterized=Truefor matplotlibAt minimum, use
rasterized=Trueinpcolormeshcalls to speed upsavefig().API Suggestions
Helper property suggestion
We could add a property to the SDCAnalysis class so that we easily get the heatmap matrix size:
Related
sdcpy-app/#4