Evaluation results of the xMEN entity linking toolkit for multiple benchmark datasets
Data files
Dec 21, 2024 version files 8.77 KB
-
bronco_diagnoses_fs.csv
209 B
-
bronco_diagnoses_ws.csv
216 B
-
bronco_medications_ws.csv
215 B
-
bronco_medications.csv
208 B
-
bronco_treatments_fs.csv
209 B
-
bronco_treatments_ws.csv
217 B
-
distemist_fs.csv
210 B
-
distemist_ws.csv
216 B
-
mantra_de.csv
613 B
-
mantra_en.csv
618 B
-
mantra_es.csv
428 B
-
mantra_fr.csv
618 B
-
mantra_nl.csv
421 B
-
medmentions_de.csv
219 B
-
medmentions_en.csv
218 B
-
medmentions_es.csv
219 B
-
medmentions_fr.csv
216 B
-
medmentions_nl.csv
219 B
-
quaero_fs.csv
419 B
-
quaero_ws.csv
426 B
-
README.md
2.44 KB
Abstract
This dataset contains the benchmark results of the xMEN toolkit for cross-lingual medical entity linking on the following, publicly available benchmark datasets:
- Mantra Gold Standard Corpus (multilingual)
- Quaero (French)
- BRONCO150 (German)
- DisTEMIST (Spanish)
- MedMentions (English + machine-translated multilingual versions)
For each dataset, we evaluate the default xMEN pipeline with different steps of candidate generation and weakly-supervised and fully-supervised re-ranking on the test sets or 5-fold-cross-validation (for BRONCO150).
Users of xMEN can use these data to compare their own results to the current state-of-the-art performance on these benchmarks, when loaded through the BigBIO library.
README: xMEN Benchmark Results
https://doi.org/10.5061/dryad.15dv41p6h
Description of the data and file structure
Evaluation of xMEN candidate generation + re-ranking (weakly and fully supervised) on various benchmark datasets.
Files and variables
Each file refers to a subset of a particular benchmark dataset.
For each subset, we run candidate generation + weakly-supervised ([filename]_ws.csv) or fully-supervised ([filename]_fs.csv)
Benchmark | Subset | file_name |
---|---|---|
Mantra | German | mantra_de |
English | mantra_en | |
Spanish | mantra_es | |
French | mantra_fr | |
Dutch | mantra_nl | |
Quaero | - | quaero |
BRONCO | Diagnoses | bronco_diagnoses |
Medications | bronco_medications | |
Treatments | bronco_treatments | |
DisTEMIST | - | distemist |
MedMentions | German | medmentions_de |
English | medmentions_en | |
Spanish | medmentions_es | |
French | medmentions_fr | |
Dutch | medmentions_nl |
####
Variables:
- key: step of the xMEN pipeline
- ngram: TF-IDF over character n-grams
- sapbert: Cross-lingual SapBERT
- ensemble: Ensemble of ngram and sapbert
- candidates: Final candidates, i.e., including semantic type filtering if applicable
- cross_encoder: Candidates re-ranked with a (ws or fs) cross-encoder
- recall_64: Recall@64, the proportion of ground-truth concepts retrieved among the top-64 predictions
- precision_1: Precision@1, the proportion of true positives among the top-1 predictions (accounting for NIL)
- recall_1: Recall@1, the proportion of ground-truth concepts retrieved among the top-1 predictions
- fscore_1: F1-Score@1, harmonic mean of precision@1 and recall@1
Code/software
Results are generated using the xMEN toolkit (https://github.com/hpi-dhc/xmen). The output is provided as plain CSV.
Access information
Data was derived from the following sources:
Methods
Evaluation of xMEN on datasets loaded from BigBIO dataloaders.