Terminology translation plays a significant role in domain-specific machine translation. However, some knowledge domains and languages still suffer from the lack of high-quality machine translation results due to the mistranslation of terminology. This is the case in the legal domain and the Arabic language. Most machine translation systems fail in their results to produce the exact equivalence of most legal terms for Arabic into other languages, mainly English and French. This failure highlights the lack of terminology resources related to the legal domain, the unfamiliarity of the legal systems to render the appropriate equivalences and the terminology linguistic characteristics of this type of discourse. This difficulty recalls the need for more legal terminology resources. In fact, even though there are many Arabic legal dictionaries, most of them are not machine-readable, and cannot be used in machine translation or other Natural Language Processing applications. As a pipeline, we first extract our terms using NooJ grammars, and then proceed with the creation of our dictionary using NooJ morpho-syntactic information (part of speech (POS), gender, number, etc.), syntactic information (transitive, intransitive, Naqis, etc.), and the creation of our semantic tags that describe our domain-knowledge terms including legal, Juri-religion, etc., and geoUsage to indicate where a given term is adapted to express a legal practice. Finally, we propose the translation. In this phase, the process relies on consulting many sources, including EUR-Lex, EuroVoc and IATE, to be then validated by our legal expert. Our electronic dictionary should enable the automatic annotation of the majority of legal documents in Arabic.
Towards a Linguistic Annotation of Arabic Legal Texts: A Multilingual Electronic Dictionary for Arabic
ElFqih, Khadija Ait;Di Buono, Maria Pia;Monti, Johanna
2024-01-01
Abstract
Terminology translation plays a significant role in domain-specific machine translation. However, some knowledge domains and languages still suffer from the lack of high-quality machine translation results due to the mistranslation of terminology. This is the case in the legal domain and the Arabic language. Most machine translation systems fail in their results to produce the exact equivalence of most legal terms for Arabic into other languages, mainly English and French. This failure highlights the lack of terminology resources related to the legal domain, the unfamiliarity of the legal systems to render the appropriate equivalences and the terminology linguistic characteristics of this type of discourse. This difficulty recalls the need for more legal terminology resources. In fact, even though there are many Arabic legal dictionaries, most of them are not machine-readable, and cannot be used in machine translation or other Natural Language Processing applications. As a pipeline, we first extract our terms using NooJ grammars, and then proceed with the creation of our dictionary using NooJ morpho-syntactic information (part of speech (POS), gender, number, etc.), syntactic information (transitive, intransitive, Naqis, etc.), and the creation of our semantic tags that describe our domain-knowledge terms including legal, Juri-religion, etc., and geoUsage to indicate where a given term is adapted to express a legal practice. Finally, we propose the translation. In this phase, the process relies on consulting many sources, including EUR-Lex, EuroVoc and IATE, to be then validated by our legal expert. Our electronic dictionary should enable the automatic annotation of the majority of legal documents in Arabic.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.