chore: add lower, lift, intersect, dna, expand_coord DIS methods#229
chore: add lower, lift, intersect, dna, expand_coord DIS methods#229SophiaPerzan-DG wants to merge 12 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends DisjointIntervalSequence (DIS) with additional coordinate/interval manipulation and mapping utilities (including genomic ↔ DIS coordinate conversion and DNA extraction), updates documentation accordingly, and adds tests for the new behaviors. It also removes the prior DIS “index direction” toggle and introduces a Genome.dna(...) convenience path intended to accept a DIS directly.
Changes:
- Added DIS methods:
expand_coord,_lower_coord,lower,genomic_span,_lift_position,lift_interval,intersect, anddna; removed the global index-direction toggle. - Updated
Genome.dnabehavior to dispatch when passed aDisjointIntervalSequence. - Added/expanded unit tests and documentation for the new DIS coordinate mapping and DNA behaviors.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_genome.py | Adds coverage for Genome.dna() when called with an Interval vs a DIS. |
| tests/test_diseq.py | Adds extensive tests for DIS coordinate expansion, lowering/lifting, intersection, span, and DNA. |
| genome_kit/genome.py | Changes Genome.dna implementation to dispatch for DIS inputs. |
| genome_kit/diseq.py | Implements new DIS methods (expand/lower/lift/intersect/dna) and removes index-direction configuration. |
| genome_kit/init.py | Exposes DisjointIntervalSequence in __all__. |
| docs-src/diseq.rst | Documents expand_coord, coordinate mapping (lower/lift_interval), and DIS DNA behavior (including Genome.dna(dis) convenience). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This reverts commit 572bc63.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
docs-src/diseq.rst:819
- The docs claim
Genome.dnaaccepts a DIS directly (genome.dna(dis)), butGenomeDNA.__call__only acceptsInterval-typed arguments viaPyAsInterval(seesrc/py_interval.h). Unless there’s an unshown overload, this example will raiseTypeError. Either remove this claim/example or implement DIS dispatch in the DNA API.
For convenience, :py:meth:`Genome.dna <genome_kit.Genome.dna>` accepts a
DIS directly and dispatches to ``DisjointIntervalSequence.dna``, so
either of the following is equivalent::
>>> dis.dna()
>>> genome.dna(dis)
| 5' -> 3' when ``coord`` falls exactly on an internal boundary between two | ||
| adjacent coord intervals — where ``iv_upstream_coord`` is the upstream-most | ||
| coordinate boundary, and ``iv_downstream_coord`` is the downstream-most | ||
| coordinate boundary (regardless of strand). Touching intervals produce |
There was a problem hiding this comment.
I forgot that we don't normalize user input when they create a DIS with 2 adjacent intervals. Shouldn't GK normalize to a single interval internally?
@AliceDG any reason to conserve user input in this scenario?
There was a problem hiding this comment.
Adjacent here means "the next coordinate interval in the list of coordinate intervals" not that they are adjacent in genomic space. A 2-element list would be returned even if there was a gap between two coord intervals (in genomic space)
There was a problem hiding this comment.
Yeah, my concern is not so relevant to this PR but more general. I was referring to "touching" rather than "adjacent".
| ``dnstream`` bases. The segment is expanded an equal amount in the | ||
| upstream/downstream direction as the coordinate intervals. |
There was a problem hiding this comment.
What happens when the segment isn't adjacent to the coord system edge?
There was a problem hiding this comment.
it still get's expanded. So if the segment is inset 10bp on both 5' and 3' ends and then you expand the coordinate, it will remain so after the expansion. This also means if you have a 0-length interval and expand the coordinate space, the segment will no longer be 0-length after expansion.
There was a problem hiding this comment.
@AliceDG is this the desired behavior?
(this comment is not blocking)
| DIS Coordinates: 0 1 2 3 4 5 6 7 | ||
| Opposite Strand | ||
|
|
||
| :: |
There was a problem hiding this comment.
| :: | |
| .. code-block:: python |
|
|
||
| ``lower`` projects the segment back to genomic space. ``lift_interval`` | ||
| projects a genomic interval into the DIS's index space and clips it | ||
| against the segment. They are conceptual inverses, but each has to deal |
There was a problem hiding this comment.
are they mechanically inverses as well? I remember the old code merged adjacent intervals. this one returns in sorted order?
| lift_interval | ||
| ~~~~~~~~~~~~~ | ||
|
|
||
| ``lift_interval`` is the inverse: it takes a genomic |
There was a problem hiding this comment.
| ``lift_interval`` is the inverse: it takes a genomic | |
| :py:meth:`~genome_kit.DisjointIntervalSequence.lift_interval` is the inverse: it takes a genomic |
and etc below
|
|
||
| Parameters | ||
| ---------- | ||
| coord : :py:class:`int` |
There was a problem hiding this comment.
| coord : :py:class:`int` | |
| coord |
sphinx can parse out the type annotations, no need to repeat (and possibly have a conflict)
| Args: | ||
| upstream: Bases to add at the coordinate space's 5' end. | ||
| dnstream: Bases to add at the coordinate space's 3' end. | ||
| Defaults to upstream (symmetric). | ||
|
|
||
| Raises: | ||
| ValueError: If either argument is negative. |
There was a problem hiding this comment.
Mixing google style docstrings with numpy ones
| iv0, ivn = coord_ivs[0], coord_ivs[-1] | ||
| chrom = iv0.chromosome | ||
| refg = iv0.reference_genome | ||
| if self.coord_strand == "+": | ||
| if len(coord_ivs) == 1: | ||
| coord_ivs = [Interval(chrom, "+", iv0.start - upstream, iv0.end + dnstream, refg)] | ||
| else: | ||
| coord_ivs[0] = Interval(chrom, "+", iv0.start - upstream, iv0.end, refg) | ||
| coord_ivs[-1] = Interval(chrom, "+", ivn.start, ivn.end + dnstream, refg) | ||
| else: | ||
| if len(coord_ivs) == 1: | ||
| coord_ivs = [Interval(chrom, "-", iv0.start - dnstream, iv0.end + upstream, refg)] | ||
| else: | ||
| coord_ivs[0] = Interval(chrom, "-", iv0.start, iv0.end + upstream, refg) | ||
| coord_ivs[-1] = Interval(chrom, "-", ivn.start - dnstream, ivn.end, refg) |
There was a problem hiding this comment.
is it possible to just use the iv0/ivn.expand function here. Less math (which is the point of gk)
| on_coordinate_strand: bool | ||
|
|
||
|
|
||
| class DisjointIntervalSequence: |
There was a problem hiding this comment.
is a DIS supposed to handle the same protocol as a regular interval? EG, you don't need to if/else on the type. If so, I think maybe there's some missing methods like spanning/midpoint/distance etc
also: