On the Evaluation of Terminology Translation Errors in NMT and PB-SMT in the Legal Domain: a Study on the Translation of Arabic Legal Documents into English and French

IRIS

In the translation process, terminological resources are employed to address translation challenges. Therefore, information regarding terminological equivalence is crucial for making the most appropriate choices in terms of translation equivalence. In the context of machine translation, neural models have indeed significantly advanced the state-of-the-art in recent years. However, they still underperform in domain-specific fields and for underresourced languages. This deficiency is particularly evident in the translation of legal terminology for Arabic, where current machine translation outputs fail to adhere to the contextual, linguistic, cultural, and terminological constraints posed by translating legal terms into Arabic. In this paper, we conduct a comparative qualitative evaluation and a comprehensive error analysis of legal terminology translation in Phrase-Based Statistical Machine Translation and Neural Machine Translation in two language pairs: Arabic-English and Arabic-French. We propose an error typology, taking the translation of legal terminology from Arabic into account. We demonstrate our findings, highlighting the strengths and weaknesses of both approaches in the realm of legal terminology translation for Arabic. Additionally, we introduce a multilingual gold standard dataset developed using our Arabic legal corpus. This dataset serves as a reliable benchmark and reference during the evaluation process to determine the degree of adequacy and fluency of the Phrase-Based Statistical Machine Translation and Neural Machine Translation systems.

On the Evaluation of Terminology Translation Errors in NMT and PB-SMT in the Legal Domain: a Study on the Translation of Arabic Legal Documents into English and French

Khadija Ait ElFqih;Johanna Monti

2023-01-01

Abstract

In the translation process, terminological resources are employed to address translation challenges. Therefore, information regarding terminological equivalence is crucial for making the most appropriate choices in terms of translation equivalence. In the context of machine translation, neural models have indeed significantly advanced the state-of-the-art in recent years. However, they still underperform in domain-specific fields and for underresourced languages. This deficiency is particularly evident in the translation of legal terminology for Arabic, where current machine translation outputs fail to adhere to the contextual, linguistic, cultural, and terminological constraints posed by translating legal terms into Arabic. In this paper, we conduct a comparative qualitative evaluation and a comprehensive error analysis of legal terminology translation in Phrase-Based Statistical Machine Translation and Neural Machine Translation in two language pairs: Arabic-English and Arabic-French. We propose an error typology, taking the translation of legal terminology from Arabic into account. We demonstrate our findings, highlighting the strengths and weaknesses of both approaches in the realm of legal terminology translation for Arabic. Additionally, we introduce a multilingual gold standard dataset developed using our Arabic legal corpus. This dataset serves as a reliable benchmark and reference during the evaluation process to determine the degree of adequacy and fluency of the Phrase-Based Statistical Machine Translation and Neural Machine Translation systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice ISBN
	
				978-954-452-090-8
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2023.contents-1.4.pdf accesso aperto Tipologia: Documento in Post-print Licenza: DRM non definito Dimensione 548 kB Formato Adobe PDF Visualizza/Apri	548 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/223582

Citazioni

ND

social impact