Corpus dell'Italiano annotato con ca. 3000 polirematiche verbali per PARSEME shared task on automatic identification of verbal MWEs

IRIS

Italian corpus annotated with verbal multiword expressions for the PARSEME shared task on automatic identification of verbal MWEs. The corpus was developed using the Paisà corpus as a basis and it consists of : - A training corpus manually annotated according to common guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 15721 sentences from it_blog, it_wikinews and it_wikipedia. - An evaluation corpus annotated according to the same guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 1279 sentences from it_blog files. Extra corpus data The train and test PARSEME-TSV file is perfectly aligned to a CoNLL-U file. Lemmas (CoNLL-U): Available (automatically annotated). POS-tags (CoNLL-U): Available (automatically annotated). The tagset is the ISST-TANL Tagsets. Morphological features (CoNLL-U): Available (automatically annotated). Dependency relations (CoNLL-U):Available (automatically annotated). The inventory is Universal Dependency Relations. No-space information (PARSEME-TSV): Available (automatically annotated). Tokenisation: The tokenization follows the original tokenization of the PAISÀ corpus Annotation: VMWEs in this language are annotated for the following categories: LVC, ID, IReflV, VPC, OTH.

Corpus dell'Italiano annotato con ca. 3000 polirematiche verbali per PARSEME shared task on automatic identification of verbal MWEs

MONTI, JOHANNA;Sangati, Federico;Anna de Santis;Maria Pia di Buono;CARUSO, VALERIA

2017-01-01

Abstract

Italian corpus annotated with verbal multiword expressions for the PARSEME shared task on automatic identification of verbal MWEs. The corpus was developed using the Paisà corpus as a basis and it consists of : - A training corpus manually annotated according to common guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 15721 sentences from it_blog, it_wikinews and it_wikipedia. - An evaluation corpus annotated according to the same guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 1279 sentences from it_blog files. Extra corpus data The train and test PARSEME-TSV file is perfectly aligned to a CoNLL-U file. Lemmas (CoNLL-U): Available (automatically annotated). POS-tags (CoNLL-U): Available (automatically annotated). The tagset is the ISST-TANL Tagsets. Morphological features (CoNLL-U): Available (automatically annotated). Dependency relations (CoNLL-U):Available (automatically annotated). The inventory is Universal Dependency Relations. No-space information (PARSEME-TSV): Available (automatically annotated). Tokenisation: The tokenization follows the original tokenization of the PAISÀ corpus Annotation: VMWEs in this language are annotated for the following categories: LVC, ID, IReflV, VPC, OTH.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2017

Appare nelle tipologie:

5.10 Banca dati

File in questo prodotto:

File	Dimensione	Formato
Parseme Shared Task Italian corpus.zip solo utenti autorizzati Descrizione: corpus annotato con polirematiche verbali (formato xml) Tipologia: Altro materiale allegato Licenza: PUBBLICO - Pubblico con Copyright Dimensione 1.84 MB Formato XML Visualizza/Apri Richiedi una copia	1.84 MB	XML	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/170896

Citazioni

ND

social impact