Named entity normalization

Given a plain text clinical case report document collection, participating systems have to return all species mentions, together with their corresponding NCBI taxonomy concept identifiers.

The National Center for Biotechnology Information (NCBI) Taxonomy includes names of organisms classified primarily based on a phylogenetic hierarchy. The NCBI Taxonomy is a universal database, used by the International Nucleotide Sequence Database Collaboration (INSDC), which includes GenBank, the European Molecular Biology Laboratory (EMBL), and DNA Data Bank of Japan (DDBJ) as a single source of taxonomic classification to maintain consistency between databases. In NCBI, each unique code identifies a specific type of organism (e.g., Taxonomy ID: 5476 for Candida Albicans) or groups of organisms (Taxonomy ID: 40674 for mammals).

Publication
Antonio Miranda-Escalada, Eulàlia Farré-Maduell, Salvador Lima-López, Darryl Estrada, Luis Gascó, Martin Krallinger (2022) Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources. Procesamiento del Lenguaje Natural, Revista nº 69, septiembre de 2022, pp. 241-253.
Language
Spanish
Abstract task
Dataset
Year
2022
Ranking metric
Micro F

Task results

System MicroPrecision MicroRecall MicroF1 Sort ascending
Vicomtech NLP 0.9376 0.9234 0.9304
Clac 0.9495 0.8910 0.9193
plncmm 0.9139 0.9060 0.9099
IGES 0.8979 0.8512 0.8740
Pumas 0.9389 0.8075 0.8682

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.