ITALERT: Assessing the Quality of LLMs and NMT in Translating Italian Emergency Response Text

IRIS

This paper presents the outcomes of an initial investigation into the performance of Large Language Models (LLMs) and Neural Machine Translation (NMT) systems in translating high-stakes messages. The research employed a novel bilingual corpus, ITALERT (Italian Emergency Response Text) and applied a human-centric post-editing based metric (HOPE) to assess translation quality systematically. The initial dataset contains eleven texts in Italian and their corresponding English translations, both extracted from the national communication campaign website of the Italian Civil Protection Department. The texts deal with eight crisis scenarios: flooding, earthquake, forest fire, volcanic eruption, tsunami, industrial accident, nuclear risk, and dam failure. The dataset has been carefully compiled to ensure usability and clarity for evaluating machine translation (MT) systems in crisis settings. Our findings show that current LLMs and NMT models, such as ChatGPT (OpenAI’s GPT-4o model) and Google MT, face limitations in translating emergency texts, particularly in maintaining the appropriate register, resolving context ambiguities, and managing domain-specific terminology.

ITALERT: Assessing the Quality of LLMs and NMT in Translating Italian Emergency Response Text

Maria Carmen Staiano^{Writing – Original Draft Preparation};Johanna Monti^Supervision;Francesca Chiusaroli^{Membro del Collaboration Group}

2025-01-01

Abstract

This paper presents the outcomes of an initial investigation into the performance of Large Language Models (LLMs) and Neural Machine Translation (NMT) systems in translating high-stakes messages. The research employed a novel bilingual corpus, ITALERT (Italian Emergency Response Text) and applied a human-centric post-editing based metric (HOPE) to assess translation quality systematically. The initial dataset contains eleven texts in Italian and their corresponding English translations, both extracted from the national communication campaign website of the Italian Civil Protection Department. The texts deal with eight crisis scenarios: flooding, earthquake, forest fire, volcanic eruption, tsunami, industrial accident, nuclear risk, and dam failure. The dataset has been carefully compiled to ensure usability and clarity for evaluating machine translation (MT) systems in crisis settings. Our findings show that current LLMs and NMT models, such as ChatGPT (OpenAI’s GPT-4o model) and Google MT, face limitations in translating emergency texts, particularly in maintaining the appropriate register, resolving context ambiguities, and managing domain-specific terminology.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Lingua/e
	
				Inglese
			
	Autore/i del Volume
	
				AA.VV
			
	Curatore del Volume
	
				Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
			
	Titolo del Volume
	
				Proceedings of Machine Translation Summit XX
			
	Relazione
	
				contributo
			
	Titolo del convegno
	
				MT Summit 2025
			
	Volume
	
				1
			
	Da pagina
	
				566
			
	A pagina
	
				577
			
	Numero di pagine
	
				12
			
	Codice ISBN
	
				978-2-9701897-0-1
			
	URL
	
				https://aclanthology.org/2025.mtsummit-1.43/
			
	Nome Editore
	
				European Association for Machine Translation
			
	Referee
	
				Esperti anonimi
			
	Periodo del Convegno
	
				June 2025
			
	Luogo del Convegno
	
				Ginevra - CH
			
	Rilevanza del Convegno
	
				Internazionale
			
	Parole chiave
	
				Large Language Models, Neural Machine Translation, Translation quality, Emergency Response Texts, evaluation
			
	Numero autori
	
				4
			
	Tutti gli autori
	
						Staiano, Maria Carmen; Han, Lifeng; Monti, Johanna; Chiusaroli, Francesca
					
	Fulltext
	
				open
			
	Tipologia sito docente
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025.mtsummit-1.43 (1).pdf accesso aperto Tipologia: Documento in Post-print Licenza: DRM non definito Dimensione 249.6 kB Formato Adobe PDF Visualizza/Apri	249.6 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/248734

Citazioni

ND

social impact