Skip to content

chore: add lower, lift, intersect, dna, expand_coord DIS methods#229

Open
SophiaPerzan-DG wants to merge 12 commits into
mainfrom
diseq-pt3
Open

chore: add lower, lift, intersect, dna, expand_coord DIS methods#229
SophiaPerzan-DG wants to merge 12 commits into
mainfrom
diseq-pt3

Conversation

@SophiaPerzan-DG
Copy link
Copy Markdown
Collaborator

also:

  • chore: remove index direction toggle in DIS

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends DisjointIntervalSequence (DIS) with additional coordinate/interval manipulation and mapping utilities (including genomic ↔ DIS coordinate conversion and DNA extraction), updates documentation accordingly, and adds tests for the new behaviors. It also removes the prior DIS “index direction” toggle and introduces a Genome.dna(...) convenience path intended to accept a DIS directly.

Changes:

  • Added DIS methods: expand_coord, _lower_coord, lower, genomic_span, _lift_position, lift_interval, intersect, and dna; removed the global index-direction toggle.
  • Updated Genome.dna behavior to dispatch when passed a DisjointIntervalSequence.
  • Added/expanded unit tests and documentation for the new DIS coordinate mapping and DNA behaviors.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_genome.py Adds coverage for Genome.dna() when called with an Interval vs a DIS.
tests/test_diseq.py Adds extensive tests for DIS coordinate expansion, lowering/lifting, intersection, span, and DNA.
genome_kit/genome.py Changes Genome.dna implementation to dispatch for DIS inputs.
genome_kit/diseq.py Implements new DIS methods (expand/lower/lift/intersect/dna) and removes index-direction configuration.
genome_kit/init.py Exposes DisjointIntervalSequence in __all__.
docs-src/diseq.rst Documents expand_coord, coordinate mapping (lower/lift_interval), and DIS DNA behavior (including Genome.dna(dis) convenience).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread genome_kit/genome.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

docs-src/diseq.rst:819

  • The docs claim Genome.dna accepts a DIS directly (genome.dna(dis)), but GenomeDNA.__call__ only accepts Interval-typed arguments via PyAsInterval (see src/py_interval.h). Unless there’s an unshown overload, this example will raise TypeError. Either remove this claim/example or implement DIS dispatch in the DNA API.
For convenience, :py:meth:`Genome.dna <genome_kit.Genome.dna>` accepts a
DIS directly and dispatches to ``DisjointIntervalSequence.dna``, so
either of the following is equivalent::

    >>> dis.dna()
    >>> genome.dna(dis)

Comment thread genome_kit/diseq.py
Comment thread genome_kit/diseq.py Outdated
Comment thread genome_kit/diseq.py
@SophiaPerzan-DG SophiaPerzan-DG marked this pull request as ready for review May 8, 2026 17:57
@SophiaPerzan-DG SophiaPerzan-DG requested review from ovesh and s22chan May 8, 2026 18:16
Comment thread docs-src/diseq.rst Outdated
Comment thread docs-src/diseq.rst
Comment thread genome_kit/diseq.py Outdated
Comment thread genome_kit/diseq.py
5' -> 3' when ``coord`` falls exactly on an internal boundary between two
adjacent coord intervals — where ``iv_upstream_coord`` is the upstream-most
coordinate boundary, and ``iv_downstream_coord`` is the downstream-most
coordinate boundary (regardless of strand). Touching intervals produce
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that we don't normalize user input when they create a DIS with 2 adjacent intervals. Shouldn't GK normalize to a single interval internally?

@AliceDG any reason to conserve user input in this scenario?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjacent here means "the next coordinate interval in the list of coordinate intervals" not that they are adjacent in genomic space. A 2-element list would be returned even if there was a gap between two coord intervals (in genomic space)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my concern is not so relevant to this PR but more general. I was referring to "touching" rather than "adjacent".

Comment thread genome_kit/diseq.py
Comment on lines +522 to +523
``dnstream`` bases. The segment is expanded an equal amount in the
upstream/downstream direction as the coordinate intervals.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the segment isn't adjacent to the coord system edge?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still get's expanded. So if the segment is inset 10bp on both 5' and 3' ends and then you expand the coordinate, it will remain so after the expansion. This also means if you have a 0-length interval and expand the coordinate space, the segment will no longer be 0-length after expansion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AliceDG is this the desired behavior?
(this comment is not blocking)

Copy link
Copy Markdown
Contributor

@ovesh ovesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, let's wait a bit before merging to allow @AliceDG to comment

Comment thread docs-src/diseq.rst
DIS Coordinates: 0 1 2 3 4 5 6 7
Opposite Strand

::
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
::
.. code-block:: python

Comment thread docs-src/diseq.rst

``lower`` projects the segment back to genomic space. ``lift_interval``
projects a genomic interval into the DIS's index space and clips it
against the segment. They are conceptual inverses, but each has to deal
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are they mechanically inverses as well? I remember the old code merged adjacent intervals. this one returns in sorted order?

Comment thread docs-src/diseq.rst
lift_interval
~~~~~~~~~~~~~

``lift_interval`` is the inverse: it takes a genomic
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``lift_interval`` is the inverse: it takes a genomic
:py:meth:`~genome_kit.DisjointIntervalSequence.lift_interval` is the inverse: it takes a genomic

and etc below

Comment thread genome_kit/diseq.py

Parameters
----------
coord : :py:class:`int`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
coord : :py:class:`int`
coord

sphinx can parse out the type annotations, no need to repeat (and possibly have a conflict)

Comment thread genome_kit/diseq.py
Comment on lines +515 to +521
Args:
upstream: Bases to add at the coordinate space's 5' end.
dnstream: Bases to add at the coordinate space's 3' end.
Defaults to upstream (symmetric).

Raises:
ValueError: If either argument is negative.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing google style docstrings with numpy ones

Comment thread genome_kit/diseq.py
Comment on lines +532 to +546
iv0, ivn = coord_ivs[0], coord_ivs[-1]
chrom = iv0.chromosome
refg = iv0.reference_genome
if self.coord_strand == "+":
if len(coord_ivs) == 1:
coord_ivs = [Interval(chrom, "+", iv0.start - upstream, iv0.end + dnstream, refg)]
else:
coord_ivs[0] = Interval(chrom, "+", iv0.start - upstream, iv0.end, refg)
coord_ivs[-1] = Interval(chrom, "+", ivn.start, ivn.end + dnstream, refg)
else:
if len(coord_ivs) == 1:
coord_ivs = [Interval(chrom, "-", iv0.start - dnstream, iv0.end + upstream, refg)]
else:
coord_ivs[0] = Interval(chrom, "-", iv0.start, iv0.end + upstream, refg)
coord_ivs[-1] = Interval(chrom, "-", ivn.start - dnstream, ivn.end, refg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to just use the iv0/ivn.expand function here. Less math (which is the point of gk)

Comment thread genome_kit/diseq.py
on_coordinate_strand: bool


class DisjointIntervalSequence:
Copy link
Copy Markdown
Collaborator

@s22chan s22chan May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a DIS supposed to handle the same protocol as a regular interval? EG, you don't need to if/else on the type. If so, I think maybe there's some missing methods like spanning/midpoint/distance etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants