This article aims at describing the objectives, the theoretical and methodological background, the development, and the first results of the Archaeo-Term project of the University of Naples "L'Orientale", Department of Literary, Linguistic and Comparative Studies. The Archaeo-Term project has been developed within the YourTermCULT project promoted by the Terminology Without Borders Project of the Terminology Coordination Unit (TermCoord) of the European Parliament - Directorate-General for Translation (DGT) specifically for collecting terminology in different aspects related to culture. The aim of the Archaeo-Term project is to enhance the access to the archaeological data in several formats and languages. It represents a common effort to contribute to the creation of linguistic and terminological resources for the domain of Cultural Heritage (CH) and, in particular, for the sub-domain of archaeology, which is notably highly complex and fragmented. One of the first results of the Archaeo-Term project is the creation of a multilingual terminological resource for the domain of archaeology, which can be conveniently employed in different Natural Language Processing (NLP) tasks, including Machine Translation (MT). The first version of the Archaeo-Term multilingual terminological resource is available in 5 languages: Italian, English, Spanish, German, and Dutch and is publicly accessible online. With the objective of promoting a common and shared termbase across different languages, the Archaeo-Term terminological resource is addressed not only to a specialized audience such as experts in the field of archaeology but also as terminological support for translators and interpreters during their professional practice, as well as for a more general audience. The terminological resource is the result of an extraction and aggregation process carried out starting from two already existing thesauri: the Italian "Thesaurus per la definizione dei reperti archeologici" developed by the Italian Central Institute for Catalogue and Documentation (Istituto Centrale per il Catalogo e la Documentazione - ICCD) and the multilingual Art and Architecture Thesaurus (AAT) developed by the Getty Research Institute, which is among the most trustworthy and accurate resources in the domain of Cultural Heritage. Taking advantage of the Semantic Web formalisms applied to these terminological resources, we are able to extract and merge information from the aforementioned thesauri using SPARQL queries. Indeed, we run different queries against the SPARQL endpoint to enrich our multilingual terminological resource by extracting useful information about the different terminological entries. The information extracted and merged from these thesauri by means of several consecutive queries is as follows: the equivalent terms in the foreseen languages, the alternative terms and the plural forms, the domains and sub-domains, the definitions of the terms, and their sources. Furthermore, the extraction phase has been followed by an evaluation step aimed at checking missing information, verifying and adjusting possible misalignments among entries, and setting potential future implementations. As an ongoing project, we are also planning to enlarge the terminological resource with equivalent terms in other languages such as French, Swedish, Polish, Russian, and Chinese, with the aim of extending the language coverage also to non European languages which are usually under-represented and low-resourced. As a first implementation with regards to the first version of the terminological resource, we have currently collected: 1.059 entries in Italian, 1.055 in Spanish, 1.053 in English, 843 in Russian, 600 in Polish, 460 in German, 193 in French, and 82 in Chinese. To conclude, the Archaeo-Term project aims at promoting the creation of high-quality and trustworthy multilingual terminological resources for the domain of archaeology by also collaborating at the same time with institutions, experts in the field of terminology, linguistics, and cultural heritage.

Le projet Archaeo-Term: premiers résultats

Andrea F. De Carlo
;
Johanna Monti
;
Maria Pia di Buono
;
Giulia Speranza
;
Maria Centrella
2022-01-01

Abstract

This article aims at describing the objectives, the theoretical and methodological background, the development, and the first results of the Archaeo-Term project of the University of Naples "L'Orientale", Department of Literary, Linguistic and Comparative Studies. The Archaeo-Term project has been developed within the YourTermCULT project promoted by the Terminology Without Borders Project of the Terminology Coordination Unit (TermCoord) of the European Parliament - Directorate-General for Translation (DGT) specifically for collecting terminology in different aspects related to culture. The aim of the Archaeo-Term project is to enhance the access to the archaeological data in several formats and languages. It represents a common effort to contribute to the creation of linguistic and terminological resources for the domain of Cultural Heritage (CH) and, in particular, for the sub-domain of archaeology, which is notably highly complex and fragmented. One of the first results of the Archaeo-Term project is the creation of a multilingual terminological resource for the domain of archaeology, which can be conveniently employed in different Natural Language Processing (NLP) tasks, including Machine Translation (MT). The first version of the Archaeo-Term multilingual terminological resource is available in 5 languages: Italian, English, Spanish, German, and Dutch and is publicly accessible online. With the objective of promoting a common and shared termbase across different languages, the Archaeo-Term terminological resource is addressed not only to a specialized audience such as experts in the field of archaeology but also as terminological support for translators and interpreters during their professional practice, as well as for a more general audience. The terminological resource is the result of an extraction and aggregation process carried out starting from two already existing thesauri: the Italian "Thesaurus per la definizione dei reperti archeologici" developed by the Italian Central Institute for Catalogue and Documentation (Istituto Centrale per il Catalogo e la Documentazione - ICCD) and the multilingual Art and Architecture Thesaurus (AAT) developed by the Getty Research Institute, which is among the most trustworthy and accurate resources in the domain of Cultural Heritage. Taking advantage of the Semantic Web formalisms applied to these terminological resources, we are able to extract and merge information from the aforementioned thesauri using SPARQL queries. Indeed, we run different queries against the SPARQL endpoint to enrich our multilingual terminological resource by extracting useful information about the different terminological entries. The information extracted and merged from these thesauri by means of several consecutive queries is as follows: the equivalent terms in the foreseen languages, the alternative terms and the plural forms, the domains and sub-domains, the definitions of the terms, and their sources. Furthermore, the extraction phase has been followed by an evaluation step aimed at checking missing information, verifying and adjusting possible misalignments among entries, and setting potential future implementations. As an ongoing project, we are also planning to enlarge the terminological resource with equivalent terms in other languages such as French, Swedish, Polish, Russian, and Chinese, with the aim of extending the language coverage also to non European languages which are usually under-represented and low-resourced. As a first implementation with regards to the first version of the terminological resource, we have currently collected: 1.059 entries in Italian, 1.055 in Spanish, 1.053 in English, 843 in Russian, 600 in Polish, 460 in German, 193 in French, and 82 in Chinese. To conclude, the Archaeo-Term project aims at promoting the creation of high-quality and trustworthy multilingual terminological resources for the domain of archaeology by also collaborating at the same time with institutions, experts in the field of terminology, linguistics, and cultural heritage.
2022
Cet article vise à décrire les objectifs, les développements et les premiers résultats du projet Archaeo-Term, promu par l'Université de Naples "L'Orientale". Le projet a été développé dans le cadre du projet YourTermCULT, en collaboration avec le programme Terminologie sans frontières de l'Unité de coordination terminologique (TermCoord) du Parlement européen - Direction générale de la traduction (DG TRAD). L'objectif du projet Archaeo-Term est la création d'une ressource terminologique multilingue pour le traitement automatique des langues (TAL) dans le domaine de l'archéologie afin d'améliorer l'accès aux données archéologiques dans divers formats et langues. La première version de la ressource multilingue "Archaeo-Term Multilingual Glossary v1.0" (Speranza et al., 2020) contient des termes en 5 langues : italien, anglais, espagnol, allemand et néerlandais ; la deuxième version intègre le français, le suédois, le polonais, le russe et le chinois. Afin d'améliorer et de promouvoir l'utilisation d'une base terminologique commune entre différentes langues, le Glossaire multilingue de l'Archaeo-Term est conçu à la fois pour les utilisateurs spécialisés, tels que les chercheurs et les experts du domaine, et pour soutenir le travail des traducteurs et des interprètes, ainsi que pour un public général de non-spécialistes. La contribution présente le contexte théorique du projet, les données et la méthodologie utilisées pour développer Archaeo-Term, les résultats obtenus et leur évaluation et, enfin, les conclusions et les développements futurs du projet.
File in questo prodotto:
File Dimensione Formato  
Le projet Archaeo-Term _ premiers résultats   _ The Project Archaeo-Term_ Initial Results .pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 937.07 kB
Formato Adobe PDF
937.07 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/209839
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact