Skip to content

PatrickB-cpu/GeneFind-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GeneFind πŸ§¬πŸ” + ChatGPT Insights

Python 3.11+

Apache 2.0

CI status

Lightweight, open-source CLI that

  1. finds open reading frames (ORFs) in any FASTA-formatted DNA sequence
  2. translates each ORF to its protein sequence (configurable genetic codes)
  3. asks ChatGPT to generate plain-English biological insights about the ORFs

The pipeline is validated on

  • 500 synthetic 5 kb strands (with known β€œplanted” genes)
  • three reference genomes: Ξ»-phage (48 kb), E. coli K-12 (4.6 Mb) and Influenza A (β‰ˆ13 kb, 8 segments)

✨ Features

Module Highlights
orf.py Linear scan over both strands (+, –) and all 3 frames; configurable start/stop sets; β‰₯ min_len filter
translate.py Fast DNA β†’ AA translation; built-in tables for Standard (1), Bacterial (11), Mitochondrial (2)
llm.py explain_orfs() wraps the OpenAI SDK – batch-prompts top N ORFs and returns a concise Markdown summary
simulate.py Generates synthetic sequences with ground-truth ORFs and computes precision / recall / F₁
cli.py One-liner interface β†’ TSV (ORFs), FASTA (proteins), and optional Markdown/HTML report with GPT text

πŸ—‚ Project Layout

Week Goal
Week 1 Set up GitHub repo, create pyproject.toml, CI config, dev environment
Week 2 Write find_orfs() in orf.py; test basic ORF detection logic
Week 3 Build translate_orf() in translate.py; support multiple genetic codes
Week 4 Integrate cli.py with CLI options for FASTA input, ORF length filter, output paths
Week 5 Create simulate.py to generate 500 test strands with planted ORFs; write scoring function
Week 6 Implement llm.py for GPT summary generation; create --explain CLI flag
Week 7 Validate pipeline on Ξ»-phage genome; refine start/stop codon edge cases
Week 8 Benchmark on E. coli K-12 and Influenza A; track runtime, precision, and coverage
Week 9 Build documentation site with MkDocs; add screenshots and examples
Week 10 Publish v0.1 to TestPyPI; prepare demo video, blog post, or academic poster

About

Python package for identifying open reading frames (ORFs) in DNA sequences, translating proteins, and generating biological insights from genomic data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors