Patients’ knowledge about drugs and medications is crucial as it allows them to administer them safely. This knowledge frequently comes from written prescriptions, patient information leaflets (PILs), or from reading drug Web pages. DIMMI (Drug InforMation Mining in Italian) is a challenge aiming at evaluating the proficiency of Large Language Models in extracting drug-specific information from PILs. The challenge seeks to advance the understanding of effectiveness in processing complex medical information in Italian, and to enhance drug information extraction and pharmacovigilance efforts. Participants are provided with a dataset of 600 Italian PILs and the objective is to develop models capable of accurately answering specific questions related to drug dosage, usage, side effects, drug-drug interactions. The challenge should be approached as an information extraction task through a zero-shot mode, purely based on the model pre-existing knowledge and understanding or through in-context learning (Retrieval-Augmented Generation (RAG) or few-shot mode). The answers generated by the models will be compared against the gold standard (GS), created to establish a reliable, accurate, and a comprehensive set of answers against which participant submissions can be evaluated. For each drug and each information category, the GS contains the correct information extracted from the leaflets through a manual annotation.

DIMMI - Drug InforMation Mining in Italian: A CALAMITA Challenge

Raffaele Manna;Maria Pia di Buono
;
Luca Giordano
2024-01-01

Abstract

Patients’ knowledge about drugs and medications is crucial as it allows them to administer them safely. This knowledge frequently comes from written prescriptions, patient information leaflets (PILs), or from reading drug Web pages. DIMMI (Drug InforMation Mining in Italian) is a challenge aiming at evaluating the proficiency of Large Language Models in extracting drug-specific information from PILs. The challenge seeks to advance the understanding of effectiveness in processing complex medical information in Italian, and to enhance drug information extraction and pharmacovigilance efforts. Participants are provided with a dataset of 600 Italian PILs and the objective is to develop models capable of accurately answering specific questions related to drug dosage, usage, side effects, drug-drug interactions. The challenge should be approached as an information extraction task through a zero-shot mode, purely based on the model pre-existing knowledge and understanding or through in-context learning (Retrieval-Augmented Generation (RAG) or few-shot mode). The answers generated by the models will be compared against the gold standard (GS), created to establish a reliable, accurate, and a comprehensive set of answers against which participant submissions can be evaluated. For each drug and each information category, the GS contains the correct information extracted from the leaflets through a manual annotation.
File in questo prodotto:
File Dimensione Formato  
126_calamita_long.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 735.59 kB
Formato Adobe PDF
735.59 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/237281
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact