Extracting relevant information in multilingual context from massive amounts of unstructured, structured and semi-structured data is a challenging task. Various theories have been developed and applied to ease the access to multicultural and multilingual resources. This papers describes a methodology for the development of an ontology-based Cross-Language Information Retrieval (CLIR) application and shows how it is possible to achieve the translation of Natural Language (NL) queries in any language by means of a knowledge-driven approach which allows to semi-automatically map natural language to formal language, simplifying and improving in this way the human-computer interaction and communication. The outlined research activities are based on Lexicon-Grammar (LG), a method devised for natural language formalization, automatic textual analysis and parsing. Thanks to its main characteristics, LG is independent from factors which are critical for other approaches, i.e. interaction type (voice or keyboard-based), length of sentences and propositions, type of vocabulary used and restrictions due to users' idiolects. The feasibility of our knowledge-based methodological framework, which allows mapping both data and metadata, will be tested for CLIR by implementing a domain-specific early prototype system.
Natural Language Processing and Big Data. An Ontology-Based Approach for Cross-Lingual Information Retrieval
MONTI, JOHANNA;di Buono, Maria Pia
2013-01-01
Abstract
Extracting relevant information in multilingual context from massive amounts of unstructured, structured and semi-structured data is a challenging task. Various theories have been developed and applied to ease the access to multicultural and multilingual resources. This papers describes a methodology for the development of an ontology-based Cross-Language Information Retrieval (CLIR) application and shows how it is possible to achieve the translation of Natural Language (NL) queries in any language by means of a knowledge-driven approach which allows to semi-automatically map natural language to formal language, simplifying and improving in this way the human-computer interaction and communication. The outlined research activities are based on Lexicon-Grammar (LG), a method devised for natural language formalization, automatic textual analysis and parsing. Thanks to its main characteristics, LG is independent from factors which are critical for other approaches, i.e. interaction type (voice or keyboard-based), length of sentences and propositions, type of vocabulary used and restrictions due to users' idiolects. The feasibility of our knowledge-based methodological framework, which allows mapping both data and metadata, will be tested for CLIR by implementing a domain-specific early prototype system.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.