Skip to content

Add Urdu translation data, chunked output, and Devanagari transliteration of first 5 parts#30

Draft
Copilot wants to merge 4 commits into
copilot/add-txt-files-to-output-directoryfrom
copilot/create-urdu-directory
Draft

Add Urdu translation data, chunked output, and Devanagari transliteration of first 5 parts#30
Copilot wants to merge 4 commits into
copilot/add-txt-files-to-output-directoryfrom
copilot/create-urdu-directory

Conversation

Copilot AI commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Adds Fateh Muhammad Jalandhry's Urdu Quran translation (6,236 ayahs) as a new data source, splits it into 50-line chunks, and provides Devanagari-script transliterations of the first five chunks.

New data

  • data/ur.jalandhry.txt — 6,236-line raw source (one ayah per line), fetched from AlQuran Cloud API

Generation pipeline

  • src/gentxtforquran.py — added Urdu entry producing output/quran_urdu_jalandhry.txt with standard surah headers (Quran - Urdu Tarjuma / Mutarjim: Fateh Muhammad Jalandhry)
  • src/genurdu_chunks.py (new) — splits data/ur.jalandhry.txt into 50-line files under output/urdu/: 124 × 50 lines + 1 × 36 lines = 125 files

Devanagari transliteration

  • output/urdu/devanagari/part_001.txtpart_005.txt — character-level Urdu→Devanagari script transliteration of the first 250 lines using a direct phoneme mapping (ش→श, ک→क, م→म, ل→ल, ر→र, etc.), preserving Urdu phonetics in Devanagari without translating to Hindi

Makefile

  • Added generate-urdu-chunks target; updated help text

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 27, 2026 10:44
Co-authored-by: druvx13 <131332414+druvx13@users.noreply.github.com>
Co-authored-by: druvx13 <131332414+druvx13@users.noreply.github.com>
Copilot AI changed the title [WIP] Create /urdu directory and split text file into chunks Add Urdu translation and output/urdu chunked files (125 × 50-line parts) Feb 27, 2026
…agari/

Co-authored-by: druvx13 <131332414+druvx13@users.noreply.github.com>
Copilot AI changed the title Add Urdu translation and output/urdu chunked files (125 × 50-line parts) Add Urdu translation data, chunked output, and Devanagari transliteration of first 5 parts Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants