new module: custom/bed12codonpositions#11733
Open
pinin4fjords wants to merge 9 commits into
Open
Conversation
…pander Generic helper that walks BED12 block (exon) structure in mRNA order and emits one BED6 row per in-frame mRNA position. Frame, step and span width are configurable via `ext.args`; spans crossing a block boundary are split into one BED row per block so each codon maps back to a contiguous mRNA region. No upstream module covers this transformation: bedtools/makewindows only tiles flat genomic spans, and bedtools/getfasta emits sequence rather than coordinates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Report python as MAJOR.MINOR (matching the env pin) instead of the full micro version, so the same hash is produced under conda (3.11.15) and the python:3.11 biocontainer (3.11.10). Drop the recursive format_yaml_like helper in favour of a direct two-line write now that the payload is a fixed shape. Tidy the script and meta.yml prose to drop residual Ribo-seq / ORF framing.
The stub block was emitting the full python micro version via `python --version`, which drifts between the biocontainer (3.11.10) and conda (3.11.15) and broke the snapshot under the conda CI shard. Switch to the same MAJOR.MINOR string the script writes so the hash is stable across both runtimes.
Pin python (3.12.11), pandas (2.3.0) and pyyaml (6.0.2) in environment.yml and point the container directive at a Wave-built image holding the same trio (community.wave.seqera.io/library/python_pandas_pyyaml, Singularity blob URL via community-cr-prod.seqera.io). Conda and the container now resolve to identical patch versions, which removes the need to report only MAJOR.MINOR in versions.yml and lets the script emit the real platform.python_version() string in a hash-stable way. Switch the BED12 reader/writer to pandas (read_csv with the UCSC BED12 column names, explode the block fields, project mRNA positions back to genomic coords via the existing helper) and write versions.yml with yaml.safe_dump from both the script and the stub so they produce identical YAML. Drop the bespoke yaml string-writer.
…warn on bad BED12
- Drop runs.sort(); per-codon rows that cross a block boundary now stay
in mRNA-traversal order on '-' strand records, matching meta.yml.
- Preserve the input BED12 score column instead of hard-coding 0.
- Emit a stderr warning when blockCount disagrees with the parsed
block fields instead of silently dropping the record.
- Add coverage:
* frame 1 (non-zero --frame)
* intron-bearing fixture (real intron gap, both strands), width 1 and 3
* keep-duplicates (3 single-nt blocks demonstrating dedup vs not)
- Update docstring + meta.yml output description accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pt docstring Trim the meta.yml module description and the template script docstring to lead with the use case (codon-level work on spliced features: ribo-seq P-site counts per codon, periodicity QC, ORF tiling) instead of recapping the BED12 spec and default args. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nf-core-modules into custom-bed12codonpositions
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A pipeline that needs codon-level genomic positions along a spliced transcript (ribo-seq P-site counts per codon, frame / periodicity QC, novel-ORF tiling) has nothing off-the-shelf to reach for.
bedtools makewindowstakes BED3 and ignores BED12 blocks; chainingbedtools bed12tobed6 | bedtools makewindowsre-anchors each exon at offset 0, so codons crossing an intron land in the wrong frame. No existing nf-core module emits per-codon BED from a BED12.What it does
For each BED12 record, walks the blocks in mRNA order (5'→3'), emits every
--step-th mRNA position starting at--frame, and projects each back to genomic coordinates. With--width N > 1it emits an N-nt span per position, splitting at block boundaries so a codon that crosses an intron becomes two rows whose union still maps to a contiguous mRNA region.Worked example
Two-exon
-strand record with one intron (spliced mRNA length 20):Default args (
--step 3 --width 1, one row per codon at its 5'-most nt) — i.e. the BED you'd intersect with an offset-corrected ribo-seq BAM to count P-sites per codon:Rows are in 5'→3' mRNA order, so on the
-strand the genomic coordinates count down. The jump from200back to107is the spliced intron.With
--width 3(full 3-nt codon spans), the codon at mRNA position 9 straddles the intron — its 5' nucleotide is in the upstream exon and the other two are in the downstream exon, so it's split into two BED rows (4th and 5th below):Score (column 5) is preserved. Frame is taken from the record's own start, so the module works regardless of any GTF
phaseannotation — running it three times with--frame 0/1/2gives you the three per-frame BEDs needed for ribo-seq periodicity QC.I/O
tuple val(meta), path(bed12)tuple val(meta), path("${prefix}.bed")(BED6) + versions topicext.args:--frame INT(0),--step INT(3),--width INT(1),--keep-duplicatesContainer
Wave-built
community.wave.seqera.io/library/python_pandas_pyyaml:75514f9f977be607(with matchingcommunity-cr-prod.seqera.iosingularity blob).Test plan