In this paper, we present a methodology for the semantic enrichment of cultural heritage (CH) data, based on the use of ontologies and Linked data. The proposed method aims at developing domain-specific resources enriched with multilingual conceptual information starting from monolingual RDF data. Particularly, our approach begins with a Multiword Expressions (MWEs) discovery process to select a starting list of domain-specific candidate mentions. Subsequently, we perform a concept discovery phase in order to link them to closely matching Dbpedia concepts through the use of two similarity measures. The semantic information related to these concepts is used to further filter the candidates and obtain representative mention-concept pairs by reweighting automatically computed scores making use of a graph representation. We test our methodology on biographic information about authors extracted from the Europeana Data Collection. The final results are a resource of semantically enriched data, containing a list of domain-specific keywords and MWEs together with Dbpedia concepts they strongly match, and the multilingual labels representing these specific concepts
From Monolingual Multiword Expression Discovery to Multilingual Concept Enrichment: an Ontology-based approach
nolano gennaro
2022-01-01
Abstract
In this paper, we present a methodology for the semantic enrichment of cultural heritage (CH) data, based on the use of ontologies and Linked data. The proposed method aims at developing domain-specific resources enriched with multilingual conceptual information starting from monolingual RDF data. Particularly, our approach begins with a Multiword Expressions (MWEs) discovery process to select a starting list of domain-specific candidate mentions. Subsequently, we perform a concept discovery phase in order to link them to closely matching Dbpedia concepts through the use of two similarity measures. The semantic information related to these concepts is used to further filter the candidates and obtain representative mention-concept pairs by reweighting automatically computed scores making use of a graph representation. We test our methodology on biographic information about authors extracted from the Europeana Data Collection. The final results are a resource of semantically enriched data, containing a list of domain-specific keywords and MWEs together with Dbpedia concepts they strongly match, and the multilingual labels representing these specific conceptsFile | Dimensione | Formato | |
---|---|---|---|
Gennaro_Nolano_2022.europhras-1.24.pdf
accesso aperto
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
508.98 kB
Formato
Adobe PDF
|
508.98 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.