GenoVarDis

The corpus consists of (i) the translation and manual curation of documents with tmVar3 annotations (Wei et al., 2022), which include PubMed summaries, to which associated diseases and symptoms were added; and (ii) the manual annotation of PubMed summaries in Spanish.

Language(s)
Spanish
Year
2024
Domain
Biology
Annotations
Each annotation includes: pmid (PubMed article ID), start and end (positions in the text), term (exact text), and entity, which can be Disease (disease), Gene (gene), Transcript (transcription variant), DNAMutation (DNA mutation), or OtherMutation (other mutations, such as in exons or missense).
Format
txt

Publication
Agüero-Torales et al. (2024). Overview of GenoVarDis at IberLEF 2024: NER of Genomic Variants and Related Diseases in Spanish. Procesamiento del Lenguaje Natural, 73: 421-434.
Number of units
633
Type of units
Documents
Training set size
427
Test set size
136
Development set size
70

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.