Overview
Want to add two new metrics to describe alignment quality:
- average sequence length
- stddev of sequence lengths
Detail
The cath-align-summary script provides some metrics to describe a (FunFam) sequence alignment: number of sequences, alignment length, dops score, gap positions, total positions.
This summary information for each alignment is currently stored in the cathpy.core.util.AlignmentSummary class:
|
class AlignmentSummary(object): |
|
"""Stores summary information about an alignment.""" |
|
|
|
def __init__(self, *, path, dops, aln_length, seq_count, gap_count, total_positions): |
|
self.path = path |
|
self.dops = float(dops) if dops is not None else None |
|
self.aln_length = int(aln_length) |
|
self.seq_count = int(seq_count) |
|
self.gap_count = int(gap_count) |
|
self.total_positions = int(total_positions) |
This could be changed to include attributes that store average_domain_length and stddev_domain_length.
These AlignmentSummary objects are created by AlignmentSummaryRunner (ie a process that generates an alignment summary for each STOCKHOLM alignment).
We would need to calculate these values and add them to the summary object within that runner:
Making changes
General approach to making changes:
- clone this repo
- create a new branch (eg called
feature/new_alignment_metrics)
- add a test to check that your feature is working
|
def test_alignment_summary_file(self): |
|
|
|
runner = AlignmentSummaryRunner( |
|
aln_file=self.merge_sto_file) |
|
entries = runner.run() |
|
self.assertEqual(len(entries), 1) |
|
summary = entries[0] |
|
self.assertEqual(summary.aln_length, 92) |
|
self.assertEqual(summary.dops, 88.422) |
|
self.assertEqual(summary.gap_count, 25228) |
|
self.assertEqual(summary.total_positions, 64492) |
|
self.assertEqual(summary.seq_count, 701) |
|
self.assertEqual(round(summary.gap_per, 2), round(39.12, 2)) |
- add the code to make your new test pass
- make sure your changes have not broken anything else (ie run
pytest)
- commit your changes, push back to origin (GitHub)
- make sure your changes have not broken anything else
- create a pull request (PR)
- someone else (me) reviews the code
- code is merged into
master branch
- we add your name as an official contributor to
cathpy :)
Overview
Want to add two new metrics to describe alignment quality:
Detail
The
cath-align-summaryscript provides some metrics to describe a (FunFam) sequence alignment: number of sequences, alignment length, dops score, gap positions, total positions.This summary information for each alignment is currently stored in the
cathpy.core.util.AlignmentSummaryclass:cathpy/cathpy/core/util.py
Lines 527 to 536 in 9e24388
This could be changed to include attributes that store
average_domain_lengthandstddev_domain_length.These
AlignmentSummaryobjects are created byAlignmentSummaryRunner(ie a process that generates an alignment summary for eachSTOCKHOLMalignment).We would need to calculate these values and add them to the summary object within that runner:
cathpy/cathpy/core/util.py
Line 612 in 9e24388
Making changes
General approach to making changes:
feature/new_alignment_metrics)cathpy/tests/util_test.py
Lines 36 to 48 in 9e24388
pytest)masterbranchcathpy:)