Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behavior. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modeled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modeling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs.

The PARSEME multilingual corpus of verbal multiword expressions

Johanna Monti
Membro del Collaboration Group
;
Federico Sangati
Membro del Collaboration Group
;
2018-01-01

Abstract

Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behavior. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modeled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modeling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs.
2018
978-3-96110-124-5
File in questo prodotto:
File Dimensione Formato  
204-3-1319-1-10-20181105.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 922.05 kB
Formato Adobe PDF
922.05 kB Adobe PDF Visualizza/Apri
parseme_copertina.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 290.95 kB
Formato Adobe PDF
290.95 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/183478
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact