From @gilgameshjw 's run using GNDB data.
- ara is the source
- ara_diacri is the diacriticized Arabic produced with rababa
- DEST_FULL_NAME_RO is the manual transliteration provided in GNDB
- ara_latinised is the output of Interscript
| ara |
ara_diacri |
ara_latinised |
DEST_FULL_NAME_RO |
index |
dist_edit |
dist_jaro_winkler |
| 0 |
گرجان |
گرِجانَ |
grijna |
0 |
girjān |
0.666667 |
| 1 |
چم كورك |
چمَ كُوَرِكَ |
chma kūarika |
1 |
cham kūrik |
0.400000 |
| 2 |
وادي نوباندي |
وَادِي نُوبَانْدِي |
wādī nūbāndī |
2 |
wādī nūbāndī |
0.000000 |
| 3 |
وادي خازيانلي |
وَادِي خَازِيَانْلِيٍّ |
wādī khāziyānlīyin |
3 |
wādī khāzyānlī |
0.285714 |
| 4 |
وادي ام بطمة |
وَادِي امْ بُطْمَةَ |
wādī am buṭmata |
4 |
wādī umm buţmah |
0.333333 |
| ... |
... |
... |
... |
... |
... |
... |
| 89 |
القباقب |
القَبَاقِبُ |
al-qabāqibu |
89 |
al qabāqib |
0.200000 |
| 90 |
العِقلة |
العَقْلَةِ |
al-‘aqlahi |
90 |
al ‘iqlah |
0.333333 |
| 91 |
الظهرور |
الظُّهْرُورُ |
al-ẓẓuhrūru |
91 |
az̧ z̧ahrūr |
0.636364 |
| 92 |
أم الدنانير |
أَمْ الدَّنَانِيرَ |
am al-ddanānīra |
92 |
umm ad danānīr |
0.428571 |
| 93 |
أرض الرجوم |
أَرْضِ الرُّجُومِ |
arḍi al-rrujūmi |
93 |
arḑ ar rujūm |
0.500000 |
Clearly there is some difference in certain entries, if you look at 91 and 93, the transliteration system is different.
@gilgameshjw can you help confirm:
- which GNDB dataset are you using?
- which transliteration system are you using?
Method to easily reproduce this output? 😉 Thanks!
From @gilgameshjw 's run using GNDB data.
Clearly there is some difference in certain entries, if you look at 91 and 93, the transliteration system is different.
@gilgameshjw can you help confirm:
Method to easily reproduce this output? 😉 Thanks!