Problem with Arabic transliteration

From @gilgameshjw 's run using GNDB data. 

* ara is the source
* ara_diacri is the diacriticized Arabic produced with rababa
* DEST_FULL_NAME_RO is the manual transliteration provided in GNDB
* ara_latinised is the output of Interscript

ara | ara_diacri | ara_latinised | DEST_FULL_NAME_RO | index | dist_edit | dist_jaro_winkler
--|--|--|--|--|--|--
0 | گرجان | گرِجانَ | grijna | 0 | girjān | 0.666667 | 0.177778
1 | چم كورك | چمَ كُوَرِكَ | chma kūarika | 1 | cham kūrik | 0.400000 | 0.088889
2 | وادي نوباندي | وَادِي نُوبَانْدِي | wādī nūbāndī | 2 | wādī nūbāndī | 0.000000 | 0.000000
3 | وادي خازيانلي | وَادِي خَازِيَانْلِيٍّ | wādī khāziyānlīyin | 3 | wādī khāzyānlī | 0.285714 | 0.074074
4 | وادي ام بطمة | وَادِي امْ بُطْمَةَ | wādī am buṭmata | 4 | wādī umm buţmah | 0.333333 | 0.238384
... | ... | ... | ... | ... | ... | ... | ...
89 | القباقب | القَبَاقِبُ | al-qabāqibu | 89 | al qabāqib | 0.200000 | 0.093939
90 | العِقلة | العَقْلَةِ | al-‘aqlahi | 90 | al ‘iqlah | 0.333333 | 0.221693
91 | الظهرور | الظُّهْرُورُ | al-ẓẓuhrūru | 91 | az̧ z̧ahrūr | 0.636364 | 0.363636
92 | أم الدنانير | أَمْ الدَّنَانِيرَ | am al-ddanānīra | 92 | umm ad danānīr | 0.428571 | 0.220924
93 | أرض الرجوم | أَرْضِ الرُّجُومِ | arḍi al-rrujūmi | 93 | arḑ ar rujūm | 0.500000 | 0.166667

Clearly there is some difference in certain entries, if you look at 91 and 93, the transliteration system is different.

@gilgameshjw can you help confirm:
* which GNDB dataset are you using?
* which transliteration system are you using?

Method to easily reproduce this output? 😉  Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with Arabic transliteration #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ara	ara_diacri	ara_latinised	DEST_FULL_NAME_RO	index	dist_edit	dist_jaro_winkler
0	گرجان	گرِجانَ	grijna	0	girjān	0.666667
1	چم كورك	چمَ كُوَرِكَ	chma kūarika	1	cham kūrik	0.400000
2	وادي نوباندي	وَادِي نُوبَانْدِي	wādī nūbāndī	2	wādī nūbāndī	0.000000
3	وادي خازيانلي	وَادِي خَازِيَانْلِيٍّ	wādī khāziyānlīyin	3	wādī khāzyānlī	0.285714
4	وادي ام بطمة	وَادِي امْ بُطْمَةَ	wādī am buṭmata	4	wādī umm buţmah	0.333333
...	...	...	...	...	...	...
89	القباقب	القَبَاقِبُ	al-qabāqibu	89	al qabāqib	0.200000
90	العِقلة	العَقْلَةِ	al-‘aqlahi	90	al ‘iqlah	0.333333
91	الظهرور	الظُّهْرُورُ	al-ẓẓuhrūru	91	az̧ z̧ahrūr	0.636364
92	أم الدنانير	أَمْ الدَّنَانِيرَ	am al-ddanānīra	92	umm ad danānīr	0.428571
93	أرض الرجوم	أَرْضِ الرُّجُومِ	arḍi al-rrujūmi	93	arḑ ar rujūm	0.500000

Problem with Arabic transliteration #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions