Italian corpus annotated with verbal multiword expressions for the PARSEME shared task on automatic identification of verbal MWEs. The corpus was developed using the Paisà corpus as a basis and it consists of : - A training corpus manually annotated according to common guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 15721 sentences from it_blog, it_wikinews and it_wikipedia. - An evaluation corpus annotated according to the same guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 1279 sentences from it_blog files. Extra corpus data The train and test PARSEME-TSV file is perfectly aligned to a CoNLL-U file. Lemmas (CoNLL-U): Available (automatically annotated). POS-tags (CoNLL-U): Available (automatically annotated). The tagset is the ISST-TANL Tagsets. Morphological features (CoNLL-U): Available (automatically annotated). Dependency relations (CoNLL-U):Available (automatically annotated). The inventory is Universal Dependency Relations. No-space information (PARSEME-TSV): Available (automatically annotated). Tokenisation: The tokenization follows the original tokenization of the PAISÀ corpus Annotation: VMWEs in this language are annotated for the following categories: LVC, ID, IReflV, VPC, OTH.
Corpus dell'Italiano annotato con ca. 3000 polirematiche verbali per PARSEME shared task on automatic identification of verbal MWEs
MONTI, JOHANNA;Sangati, Federico;Maria Pia di Buono;CARUSO, VALERIA
2017-01-01
Abstract
Italian corpus annotated with verbal multiword expressions for the PARSEME shared task on automatic identification of verbal MWEs. The corpus was developed using the Paisà corpus as a basis and it consists of : - A training corpus manually annotated according to common guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 15721 sentences from it_blog, it_wikinews and it_wikipedia. - An evaluation corpus annotated according to the same guidelines (http://parsemefr.lif.univ-mrs.fr/guidelines-hypertext/?page=080_Annotation_management/020_Annotation_platform_FLAT): 1279 sentences from it_blog files. Extra corpus data The train and test PARSEME-TSV file is perfectly aligned to a CoNLL-U file. Lemmas (CoNLL-U): Available (automatically annotated). POS-tags (CoNLL-U): Available (automatically annotated). The tagset is the ISST-TANL Tagsets. Morphological features (CoNLL-U): Available (automatically annotated). Dependency relations (CoNLL-U):Available (automatically annotated). The inventory is Universal Dependency Relations. No-space information (PARSEME-TSV): Available (automatically annotated). Tokenisation: The tokenization follows the original tokenization of the PAISÀ corpus Annotation: VMWEs in this language are annotated for the following categories: LVC, ID, IReflV, VPC, OTH.File | Dimensione | Formato | |
---|---|---|---|
Parseme Shared Task Italian corpus.zip
solo utenti autorizzati
Descrizione: corpus annotato con polirematiche verbali (formato xml)
Tipologia:
Altro materiale allegato
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
1.84 MB
Formato
XML
|
1.84 MB | XML | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.