Skip to content

Transliterate Urdu Quran text to Devanagari script#28

Draft
Copilot wants to merge 3 commits into
copilot/add-ur-jalandhry-txt-filefrom
copilot/transliterate-urdu-to-devanagri-again
Draft

Transliterate Urdu Quran text to Devanagari script#28
Copilot wants to merge 3 commits into
copilot/add-ur-jalandhry-txt-filefrom
copilot/transliterate-urdu-to-devanagri-again

Conversation

Copilot AI commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Adds a complete Urdu-to-Devanagari transliteration of the Jalandhry Urdu Quran translation.

Changes Made

src/urdu_to_devanagari.py (new)

A character-level transliteration engine with:

  • Two-character aspirated combos processed first (بھ→भ, پھ→फ, کھ→ख, دھ→ध, etc.)
  • ~85 word-boundary-safe substitutions for the most frequent Urdu words (الله→अल्लाह, اور→और, نہیں→नहीं, خدا→ख़ुदा, تم→तुम, اس→उस, جس→जिस, پھر→फिर, کچھ→कुछ, نہ→न, ایک→एक, ایسا→ऐसा, شیطان→शैतान, پیدا→पैदा, بھی→भी, and 70+ more)
  • Context-sensitive ی/و: vowel matra (ī/o) after a consonant, consonant (य/व) at word-start
  • ا/آ → ा after consonant, अ/आ standalone; ں→ं; ے→े; standalone ھ→ह
  • Urdu punctuation mapped: ،→, ؟→? ۔→।
  • Honorific signs (ؑ, ؓ) and ﷺ preserved unchanged

output/quran_urdu_devanagari_jalandhry.txt (new)

  • 6,582 lines covering all 114 Surahs and all 6,236 verses
  • Verified in 100-line batches; zero untranslated Urdu words remain (only intentional honorific signs preserved)
  • Surah titles and verse number markers kept in original English/numeric format

src/gentxtforquran.py (updated)

  • Imports moved to module level (PEP 8)
  • Calls transliterate_file() at end of run so the Devanagari file is regenerated automatically alongside all other output files

Testing

  • ✅ All 6,236 verse markers verified to match the source Urdu file exactly
  • ✅ No untranslated Arabic/Urdu characters remain in verse lines (only 26 intentional honorific signs)
  • python3 src/urdu_to_devanagari.py and python3 src/gentxtforquran.py both run successfully
  • ✅ Output spot-checked across all 114 Surahs

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: druvx13 <131332414+druvx13@users.noreply.github.com>
@druvx13

druvx13 commented Feb 27, 2026

Copy link
Copy Markdown
Owner

@copilot you can try again

…, fixed word-boundary substitutions

Co-authored-by: druvx13 <131332414+druvx13@users.noreply.github.com>

Copilot AI commented Feb 27, 2026

Copy link
Copy Markdown
Contributor Author

@copilot you can try again

Done! The transliteration is now complete (commit 31599ee).

What was done:

  • src/urdu_to_devanagari.py — full character-level Urdu→Devanagari engine with ~85 word-boundary-safe substitutions for the most common Urdu words
  • output/quran_urdu_devanagari_jalandhry.txt — all 6,236 verses transliterated; verified in batches with zero untranslated Urdu words remaining (only honorific signs ؑ/ؓ/ﷺ preserved as-is per spec)
  • src/gentxtforquran.py updated to regenerate the Devanagari file automatically

Sample (Surah 1):

[1:1] शुरू अल्लाह का नाम ले कर जो बड़ा महरबान नहायत रहम वाला है
[1:2] सब तरह की तारीफ़ ख़ुदा ही को (सज़ावार) है जो तमाम मख़लोक़ात का परवरदगार है
[1:5] (ए परवरदगार) हम तेरी ही इबादत करते हैं और तुझ ही से मदद मानगते हैं

Copilot AI changed the title [WIP] Transliterate Urdu Quran text to Devanagri script Transliterate Urdu Quran text to Devanagari script Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants