Skip to content

feature: Fast combi_plot for large datasets #18

@AlFontal

Description

@AlFontal

Problem

When running combi_plot() on large SDC analyses (e.g., 6,500+ rows with large lag ranges like ±365), the matplotlib rendering can take 5-10+ minutes or even hang indefinitely. This is because the heatmap matrix becomes enormous (potentially millions of cells).

Proposed Solutions

1. Pillow-based heatmap rendering (fastest)

For matrices exceeding a threshold, bypass matplotlib's pcolormesh and generate the heatmap image directly with Pillow:

from PIL import Image
import numpy as np
def _fast_heatmap(matrix, cmap='RdBu_r', vmin=-1, vmax=1):
    """Generate heatmap image using Pillow instead of matplotlib."""
    normalized = np.clip((matrix - vmin) / (vmax - vmin), 0, 1)
    cm = plt.cm.get_cmap(cmap)
    colored = (cm(normalized)[:, :, :3] * 255).astype(np.uint8)
    return Image.fromarray(colored)

This could reduce rendering from 5+ minutes to ~5-10 seconds.

2. rasterized=True for matplotlib

At minimum, use rasterized=True in pcolormesh calls to speed up savefig().

API Suggestions

# Option A: Add fast parameter
analysis.combi_plot(fast=True)  # Uses optimized rendering
# Option B: Auto-detect based on matrix size
analysis.combi_plot()  # Automatically uses fast mode when matrix > threshold
# Option C: Expose matrix size for client decisions
if analysis.heatmap_size > 250_000:  # 500x500
    # Use alternative visualization

Helper property suggestion

We could add a property to the SDCAnalysis class so that we easily get the heatmap matrix size:

@property
def heatmap_size(self) -> int:
    """Number of cells in the correlation heatmap matrix."""
    n_fragments_1 = len(self.ts1) - self.fragment_size + 1
    n_fragments_2 = len(self.ts2) - self.fragment_size + 1
    return n_fragments_1 * n_fragments_2

Related

sdcpy-app/#4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions