Italian corpus annotated with verbal multiword expressions for the PARSEME shared task on automatic identification of verbal MWEs. The corpus was developed using the Paisà corpus as a basis and it consists of : - A training corpus manually annotated according to common guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 15721 sentences from it_blog, it_wikinews and it_wikipedia. - An evaluation corpus annotated according to the same guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 1279 sentences from it_blog files. Extra corpus data The train and test PARSEME-TSV file is perfectly aligned to a CoNLL-U file. Lemmas (CoNLL-U): Available (automatically annotated). POS-tags (CoNLL-U): Available (automatically annotated). The tagset is the ISST-TANL Tagsets. Morphological features (CoNLL-U): Available (automatically annotated). Dependency relations (CoNLL-U):Available (automatically annotated). The inventory is Universal Dependency Relations. No-space information (PARSEME-TSV): Available (automatically annotated). Tokenisation: The tokenization follows the original tokenization of the PAISÀ corpus Annotation: VMWEs in this language are annotated for the following categories: LVC, ID, IReflV, VPC, OTH.

Corpus dell'Italiano annotato con ca. 3000 polirematiche verbali per PARSEME shared task on automatic identification of verbal MWEs

MONTI, JOHANNA;Sangati, Federico;Maria Pia di Buono;CARUSO, VALERIA
2017-01-01

Abstract

Italian corpus annotated with verbal multiword expressions for the PARSEME shared task on automatic identification of verbal MWEs. The corpus was developed using the Paisà corpus as a basis and it consists of : - A training corpus manually annotated according to common guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 15721 sentences from it_blog, it_wikinews and it_wikipedia. - An evaluation corpus annotated according to the same guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 1279 sentences from it_blog files. Extra corpus data The train and test PARSEME-TSV file is perfectly aligned to a CoNLL-U file. Lemmas (CoNLL-U): Available (automatically annotated). POS-tags (CoNLL-U): Available (automatically annotated). The tagset is the ISST-TANL Tagsets. Morphological features (CoNLL-U): Available (automatically annotated). Dependency relations (CoNLL-U):Available (automatically annotated). The inventory is Universal Dependency Relations. No-space information (PARSEME-TSV): Available (automatically annotated). Tokenisation: The tokenization follows the original tokenization of the PAISÀ corpus Annotation: VMWEs in this language are annotated for the following categories: LVC, ID, IReflV, VPC, OTH.
File in questo prodotto:
File Dimensione Formato  
Parseme Shared Task Italian corpus.zip

solo utenti autorizzati

Descrizione: corpus annotato con polirematiche verbali (formato xml)
Tipologia: Altro materiale allegato
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 1.84 MB
Formato XML
1.84 MB XML   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/170896
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact