Fix resampling of gapped recordings. by grahamfindlay · Pull Request #4499 · SpikeInterface/spikeinterface

grahamfindlay · 2026-04-09T00:23:33Z

This handles an edge case that I should have thought of in #4429 : When a recording containing gaps is resampled, interpolation of timestamps and data should never span a gap. They should be done on a section-by-section basis, where "section" = "a gapless (contiguously sampled) stretch of data" (I chose the term "section" to avoid confusion with neo/spikeinterface "blocks" or "segments", which are are not required to be gapless).

This PR does that, and introduces a new gap_tolerance_ms parameter to resample() that controls the behavior:

sp.resample(recording, resample_rate, gap_tolerance_ms=None, ...)

I tried to make this analogous to and consistent with the gap_tolerance_ms parameter in neo, discussed [here] (NeuralEnsemble/python-neo#1773) by @h-mayorquin and @samuelgarcia and implemented in this PR. So the behavior for the following values of gap_tolerance_ms is:

None (default): If timestamp gaps are detected in the parent time
vector, a ValueError is raised with a report of the gaps (count, sizes,
positions).
<value> (e.g., 1.0): Opt in to section-wise resampling. Gaps
larger than the threshold trigger section splitting; smaller gaps are
treated as continuous.
0.0: Strict mode — split on any gap >= 1.5 sample periods.

Internally, gap detection identifies contiguous sections of samples.
Each section is resampled independently --
both timestamps and traces -- then results are concatenated. Margins for
edge-artifact reduction are clamped to section boundaries so that FFT
processing never crosses a gap. This is the most obnoxious part. I really wanted to find a way to let scipy.signal handle reflect padding in this case, but I could not, so I do it, and that adds a fair bit of complexity to the code.

When no gaps are detected (eg if the recording has no time vector), the existing code path executes unchanged.

I tested this on a bunch of my own gapped recordings, and it seems to work. Then I had my buddy Claude ;) write a bunch of tests (included here) based on those, using synthetic gapped recordings.

Because of the antialiasing filter applied during decimate(), a similar approach should technically be taken there (so the AA filter applied to one section isn't influenced by data from a neighboring section), but the scope of this PR is just limited to resample(). Errors introduced by gaps in decimate() should be small and mostly negligible.

for more information, see https://pre-commit.ci

h-mayorquin

This makes sense to me. I can't still fully wrap my head around exactly how resampling goes bad at the gaps but of course we should account for them. Thanks as wel for taking a look at the neo implmeentation regarding gaps it is great to unify the API.

Some comments on the implementation:

Pre-allocate memory instead of concatenating. In _get_traces_gapped, you know the output size upfront (end_frame - start_frame). Pre-allocating a single array and filling slices would avoid the intermediate allocations from appending to pieces and the final np.concatenate. This is the pattern we use elsewhere, e.g. in NoiseGeneratorRecording.
Unify the code paths. Rather than having get_traces dispatch to _get_traces_gapped, could _get_traces_gapped just become the single get_traces? A gap-free recording is simply one section spanning the entire trace, so the gapped logic handles it naturally with one iteration of the loop. This would eliminate the duplication between the two methods.
Reuse get_chunk_with_margin. The margin clamping, fetching, and reflect-padding logic in _get_traces_gapped reimplements what get_chunk_with_margin already does. The difference is that you clamp to section boundaries instead of recording boundaries, but I think this could be handled by passing the section length as the effective boundary. Worth exploring to avoid duplicating that logic.
Extract gap detection into a shared utility. The gap detection and section boundary computation (~30 lines in __init__) could live as a standalone function in recording_tools.py, something like detect_sections(time_vector, sampling_frequency, gap_tolerance_ms) that returns the (n_sections, 2) boundary array. This would make the __init__ easier to read and the utility reusable if other preprocessing steps that use filters (e.g., FilterRecording) need gap awareness later. What do you think?
Tests. The coverage is solid. A couple of minor notes: there's no test for integer ratio with gaps on traces (the trace-correctness tests all use non-integer ratio), and test_resample_traces_across_gap accesses private attributes (_sec_n_out) to find the split point between sections. You could derive that from the output time vector instead (find the gap in resampled timestamps), which would decouple the test from the internal representation.

All of these are strict improvements and I think the current implementation is good to go as-is. We can track these as a follow-up issue. The only one that I kind of feel should be done here is the pre-allocation (1) and maybe the tests (5) but I will leave that to @alejoe91 and you to decide.

Use `end_frame` - `start_frame` to compute the size of the output array, pre-allocate, and write each section's output to it directly, using `pos` to keep track. Related improvement: Instead of using `len(channel_indices)` to get the size of the channel dimenion of the output array, use the actual `.shape` of the parent traces, so any indexing method (e.g. slicing) is handled correctly.

Added testing of both integer and non-integer sample rate ratios. Also derive section splits from the time vector rather than accessing a private attribute.

grahamfindlay · 2026-04-14T01:13:35Z

Thanks @h-mayorquin ! I just added two new commits to address (1) and (5), respectively.

For (1), in order to make sure I get the size of the channel dimension right when pre-allocating, I do a 1-sample fetch of the parent's traces and use it's shape, instead of using len(channel_indices) in case that's a slice or something. I do this before the loop over sections, so the section data are also fetched in addition to this 1 sample fetch, which makes the tiny pre-fetch technically redundant if get_traces() yields any data (which would be most of the time). But it makes the code so much simpler and more readable (vs lazily allocating the result array on the first section fetch, from the section traces' shape), and I figure the cost of a 1-sample read per get_traces() is so low, it was worth it.

I think (2), (3), and (4) are all good ideas. I would learn towards tracking them as a follow-up issue like you suggest, and tackling them in a separate PR. The main reason being that unifying get_traces and get_traces_gapped is a change with the potential to impact many more users, since it touches the super common default (i.e. no gap) path, so I would want to give it a different level of scrutiny and testing than I am prepared to at this moment. And, basically the same reasoning goes for get_chunk_with_margin (I would prioritize that -- the duplication for sure bothers me).

grahamfindlay · 2026-04-14T01:45:36Z

Oh, also, I forgot... re. "exactly how resampling goes bad at the gaps": scipy.signal.resample and scipy.signal.decimate don't know about the parent's time vector -- they see a 1D array of samples and implicitly assume samples are uniformly spaced.

A gapped recording's traces array is contiguous in memory even though the samples are not contiguous (uniformly spaced) in time: the sample at t=2000s is adjacent to the sample at t=20000s, even though 5 h of (wall) clock time separate them.

So because of the implicit assumption, resample(x, int(len(x) * new_rate / old_rate)) (x is the data) ends up generating outputs that no longer line up with the recording's time grid. It generates data for timestamps/samples inside the gap (that shouldn't exist) and doesn't produce enough samples outside the gap (that should exist).

And because resample if FFT-based, samples on one side of the gap end up affecting samples that are contiguous in memory (but not in wall clock time) with samples on the other side of the gap. In the worst case, the temporal discontinuity would also be accompanied by a discontinuity in the signal level, and then you'd get Gibbs artifact.

(Similar situation for decimate: the anti-aliasing filter is a convolution so it mixes pre-gap samples with post-gap samples, but at least the artifact width in this case is just limited to the filter length, so not such a big deal).

h-mayorquin

LGTM

Thanks for addressing the pre-allocation, it looks good. One thought on the 1-sample fetch to get the channel count:

Instead of fetching a sample from the parent, we could pass num_channels from the Recording to the segment at construction time (the Recording already knows this), and then derive the output channel count from channel_indices directly:

if isinstance(channel_indices, slice):
    start, stop, step = channel_indices.indices(self.num_channels)
    num_channels_traces = len(range(start, stop, step))
else:
    num_channels_traces = len(channel_indices)

This handles all cases: slice(None) (all channels), slice(0, 10) (consecutive subset), slice(0, 10, 2) (stepped), and arrays/lists. len(range(...)) is O(1).

This would avoid the 1-sample parent fetch per get_traces call. Not a big deal for this PR, but it's a pattern that could be useful more broadly if we make segments channel-aware in general. I opened #4510 to track this.

grahamfindlay and others added 2 commits April 8, 2026 13:51

Fix resampling of gapped recordings.

1e31919

[pre-commit.ci] auto fixes from pre-commit.com hooks

bfd9a66

for more information, see https://pre-commit.ci

alejoe91 requested a review from h-mayorquin April 9, 2026 09:53

h-mayorquin approved these changes Apr 9, 2026

View reviewed changes

grahamfindlay added 2 commits April 13, 2026 19:46

Improve _test_resample_traces_across_gap`

69168db

Added testing of both integer and non-integer sample rate ratios. Also derive section splits from the time vector rather than accessing a private attribute.

h-mayorquin mentioned this pull request Apr 14, 2026

Make num_channels available on BaseRecordingSegment #4510

Open

h-mayorquin approved these changes Apr 14, 2026

View reviewed changes

alejoe91 added the preprocessing Related to preprocessing module label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix resampling of gapped recordings.#4499

Fix resampling of gapped recordings.#4499
grahamfindlay wants to merge 4 commits intoSpikeInterface:mainfrom
grahamfindlay:feature/gap_tolerant_resampling

grahamfindlay commented Apr 9, 2026

Uh oh!

h-mayorquin left a comment

Uh oh!

grahamfindlay commented Apr 14, 2026

Uh oh!

grahamfindlay commented Apr 14, 2026 •

edited

Loading

Uh oh!

h-mayorquin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grahamfindlay commented Apr 9, 2026

Uh oh!

h-mayorquin left a comment

Choose a reason for hiding this comment

Uh oh!

grahamfindlay commented Apr 14, 2026

Uh oh!

grahamfindlay commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-mayorquin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grahamfindlay commented Apr 14, 2026 •

edited

Loading