DIMMI - Drug InforMation Mining in Italian

Manna, Raffaele; Di Buono, Maria Pia; Giordano, Luca

DIMMI consists of 600 Italian drug package leaflets. The documents in the DIMMI exhibit a wide range of lengths, with the shortest document containing 363 tokens and the longest extending to 11,730 tokens. DIMMI dataset is derived from the D-LeafIT Corpus, made up of 1819 Italian drug package leaflets. The corpus has been created extracting PILs available on the Italian Agency for Medications (Agenzia Italiana del Farmaco - AIFA), among which 1439 refer to generic drugs and 380 to class A drugs.