DIMMI - Drug InforMation Mining in Italian: A CALAMITA Challenge

IRIS

Patients’ knowledge about drugs and medications is crucial as it allows them to administer them safely. This knowledge frequently comes from written prescriptions, patient information leaflets (PILs), or from reading drug Web pages. DIMMI (Drug InforMation Mining in Italian) is a challenge aiming at evaluating the proficiency of Large Language Models in extracting drug-specific information from PILs. The challenge seeks to advance the understanding of effectiveness in processing complex medical information in Italian, and to enhance drug information extraction and pharmacovigilance efforts. Participants are provided with a dataset of 600 Italian PILs and the objective is to develop models capable of accurately answering specific questions related to drug dosage, usage, side effects, drug-drug interactions. The challenge should be approached as an information extraction task through a zero-shot mode, purely based on the model pre-existing knowledge and understanding or through in-context learning (Retrieval-Augmented Generation (RAG) or few-shot mode). The answers generated by the models will be compared against the gold standard (GS), created to establish a reliable, accurate, and a comprehensive set of answers against which participant submissions can be evaluated. For each drug and each information category, the GS contains the correct information extracted from the leaflets through a manual annotation.

DIMMI - Drug InforMation Mining in Italian: A CALAMITA Challenge

Raffaele Manna;Maria Pia di Buono;Luca Giordano

2024-01-01

Abstract

Patients’ knowledge about drugs and medications is crucial as it allows them to administer them safely. This knowledge frequently comes from written prescriptions, patient information leaflets (PILs), or from reading drug Web pages. DIMMI (Drug InforMation Mining in Italian) is a challenge aiming at evaluating the proficiency of Large Language Models in extracting drug-specific information from PILs. The challenge seeks to advance the understanding of effectiveness in processing complex medical information in Italian, and to enhance drug information extraction and pharmacovigilance efforts. Participants are provided with a dataset of 600 Italian PILs and the objective is to develop models capable of accurately answering specific questions related to drug dosage, usage, side effects, drug-drug interactions. The challenge should be approached as an information extraction task through a zero-shot mode, purely based on the model pre-existing knowledge and understanding or through in-context learning (Retrieval-Augmented Generation (RAG) or few-shot mode). The answers generated by the models will be compared against the gold standard (GS), created to establish a reliable, accurate, and a comprehensive set of answers against which participant submissions can be evaluated. For each drug and each information category, the GS contains the correct information extracted from the leaflets through a manual annotation.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
126_calamita_long.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 735.59 kB Formato Adobe PDF Visualizza/Apri	735.59 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/237281

Citazioni

ND

social impact