Skip to content

Fix resampling of gapped recordings.#4499

Open
grahamfindlay wants to merge 4 commits intoSpikeInterface:mainfrom
grahamfindlay:feature/gap_tolerant_resampling
Open

Fix resampling of gapped recordings.#4499
grahamfindlay wants to merge 4 commits intoSpikeInterface:mainfrom
grahamfindlay:feature/gap_tolerant_resampling

Conversation

@grahamfindlay
Copy link
Copy Markdown
Contributor

This handles an edge case that I should have thought of in #4429 : When a recording containing gaps is resampled, interpolation of timestamps and data should never span a gap. They should be done on a section-by-section basis, where "section" = "a gapless (contiguously sampled) stretch of data" (I chose the term "section" to avoid confusion with neo/spikeinterface "blocks" or "segments", which are are not required to be gapless).

This PR does that, and introduces a new gap_tolerance_ms parameter to resample() that controls the behavior:

sp.resample(recording, resample_rate, gap_tolerance_ms=None, ...)

I tried to make this analogous to and consistent with the gap_tolerance_ms parameter in neo, discussed [here] (NeuralEnsemble/python-neo#1773) by @h-mayorquin and @samuelgarcia and implemented in this PR. So the behavior for the following values of gap_tolerance_ms is:

  • None (default): If timestamp gaps are detected in the parent time
    vector, a ValueError is raised with a report of the gaps (count, sizes,
    positions).
  • <value> (e.g., 1.0): Opt in to section-wise resampling. Gaps
    larger than the threshold trigger section splitting; smaller gaps are
    treated as continuous.
  • 0.0: Strict mode — split on any gap >= 1.5 sample periods.

Internally, gap detection identifies contiguous sections of samples.
Each section is resampled independently --
both timestamps and traces -- then results are concatenated. Margins for
edge-artifact reduction are clamped to section boundaries so that FFT
processing never crosses a gap. This is the most obnoxious part. I really wanted to find a way to let scipy.signal handle reflect padding in this case, but I could not, so I do it, and that adds a fair bit of complexity to the code.

When no gaps are detected (eg if the recording has no time vector), the existing code path executes unchanged.

I tested this on a bunch of my own gapped recordings, and it seems to work. Then I had my buddy Claude ;) write a bunch of tests (included here) based on those, using synthetic gapped recordings.

Because of the antialiasing filter applied during decimate(), a similar approach should technically be taken there (so the AA filter applied to one section isn't influenced by data from a neighboring section), but the scope of this PR is just limited to resample(). Errors introduced by gaps in decimate() should be small and mostly negligible.

@alejoe91 alejoe91 requested a review from h-mayorquin April 9, 2026 09:53
Copy link
Copy Markdown
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I can't still fully wrap my head around exactly how resampling goes bad at the gaps but of course we should account for them. Thanks as wel for taking a look at the neo implmeentation regarding gaps it is great to unify the API.

Some comments on the implementation:

  1. Pre-allocate memory instead of concatenating. In _get_traces_gapped, you know the output size upfront (end_frame - start_frame). Pre-allocating a single array and filling slices would avoid the intermediate allocations from appending to pieces and the final np.concatenate. This is the pattern we use elsewhere, e.g. in NoiseGeneratorRecording.

  2. Unify the code paths. Rather than having get_traces dispatch to _get_traces_gapped, could _get_traces_gapped just become the single get_traces? A gap-free recording is simply one section spanning the entire trace, so the gapped logic handles it naturally with one iteration of the loop. This would eliminate the duplication between the two methods.

  3. Reuse get_chunk_with_margin. The margin clamping, fetching, and reflect-padding logic in _get_traces_gapped reimplements what get_chunk_with_margin already does. The difference is that you clamp to section boundaries instead of recording boundaries, but I think this could be handled by passing the section length as the effective boundary. Worth exploring to avoid duplicating that logic.

  4. Extract gap detection into a shared utility. The gap detection and section boundary computation (~30 lines in __init__) could live as a standalone function in recording_tools.py, something like detect_sections(time_vector, sampling_frequency, gap_tolerance_ms) that returns the (n_sections, 2) boundary array. This would make the __init__ easier to read and the utility reusable if other preprocessing steps that use filters (e.g., FilterRecording) need gap awareness later. What do you think?

  5. Tests. The coverage is solid. A couple of minor notes: there's no test for integer ratio with gaps on traces (the trace-correctness tests all use non-integer ratio), and test_resample_traces_across_gap accesses private attributes (_sec_n_out) to find the split point between sections. You could derive that from the output time vector instead (find the gap in resampled timestamps), which would decouple the test from the internal representation.

All of these are strict improvements and I think the current implementation is good to go as-is. We can track these as a follow-up issue. The only one that I kind of feel should be done here is the pre-allocation (1) and maybe the tests (5) but I will leave that to @alejoe91 and you to decide.

Use `end_frame` - `start_frame` to compute the size of the output
array, pre-allocate, and write each section's output to it directly,
using `pos` to keep track.

Related improvement: Instead of using `len(channel_indices)` to get
the size of the channel dimenion of the output array, use the actual
`.shape` of the parent traces, so any indexing method (e.g. slicing)
is handled correctly.
Added testing of both integer and non-integer sample rate ratios.
Also derive section splits from the time vector rather than
accessing a private attribute.
@grahamfindlay
Copy link
Copy Markdown
Contributor Author

Thanks @h-mayorquin ! I just added two new commits to address (1) and (5), respectively.

For (1), in order to make sure I get the size of the channel dimension right when pre-allocating, I do a 1-sample fetch of the parent's traces and use it's shape, instead of using len(channel_indices) in case that's a slice or something. I do this before the loop over sections, so the section data are also fetched in addition to this 1 sample fetch, which makes the tiny pre-fetch technically redundant if get_traces() yields any data (which would be most of the time). But it makes the code so much simpler and more readable (vs lazily allocating the result array on the first section fetch, from the section traces' shape), and I figure the cost of a 1-sample read per get_traces() is so low, it was worth it.

I think (2), (3), and (4) are all good ideas. I would learn towards tracking them as a follow-up issue like you suggest, and tackling them in a separate PR. The main reason being that unifying get_traces and get_traces_gapped is a change with the potential to impact many more users, since it touches the super common default (i.e. no gap) path, so I would want to give it a different level of scrutiny and testing than I am prepared to at this moment. And, basically the same reasoning goes for get_chunk_with_margin (I would prioritize that -- the duplication for sure bothers me).

@grahamfindlay
Copy link
Copy Markdown
Contributor Author

grahamfindlay commented Apr 14, 2026

Oh, also, I forgot... re. "exactly how resampling goes bad at the gaps": scipy.signal.resample and scipy.signal.decimate don't know about the parent's time vector -- they see a 1D array of samples and implicitly assume samples are uniformly spaced.

A gapped recording's traces array is contiguous in memory even though the samples are not contiguous (uniformly spaced) in time: the sample at t=2000s is adjacent to the sample at t=20000s, even though 5 h of (wall) clock time separate them.

So because of the implicit assumption, resample(x, int(len(x) * new_rate / old_rate)) (x is the data) ends up generating outputs that no longer line up with the recording's time grid. It generates data for timestamps/samples inside the gap (that shouldn't exist) and doesn't produce enough samples outside the gap (that should exist).

And because resample if FFT-based, samples on one side of the gap end up affecting samples that are contiguous in memory (but not in wall clock time) with samples on the other side of the gap. In the worst case, the temporal discontinuity would also be accompanied by a discontinuity in the signal level, and then you'd get Gibbs artifact.

(Similar situation for decimate: the anti-aliasing filter is a convolution so it mixes pre-gap samples with post-gap samples, but at least the artifact width in this case is just limited to the filter length, so not such a big deal).

Copy link
Copy Markdown
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for addressing the pre-allocation, it looks good. One thought on the 1-sample fetch to get the channel count:

Instead of fetching a sample from the parent, we could pass num_channels from the Recording to the segment at construction time (the Recording already knows this), and then derive the output channel count from channel_indices directly:

if isinstance(channel_indices, slice):
    start, stop, step = channel_indices.indices(self.num_channels)
    num_channels_traces = len(range(start, stop, step))
else:
    num_channels_traces = len(channel_indices)

This handles all cases: slice(None) (all channels), slice(0, 10) (consecutive subset), slice(0, 10, 2) (stepped), and arrays/lists. len(range(...)) is O(1).

This would avoid the 1-sample parent fetch per get_traces call. Not a big deal for this PR, but it's a pattern that could be useful more broadly if we make segments channel-aware in general. I opened #4510 to track this.

@alejoe91 alejoe91 added the preprocessing Related to preprocessing module label Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preprocessing Related to preprocessing module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants