(named) entity recognition

DIANN-2018-EN

Read more about DIANN-2018-EN
Log in or register to post comments

The corpus is a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain collected between 2017 and 2018. It is divided into two disjoined parts: training set (80%) and test set (20%). It is annotated with disabilities and negations and their scope.

DIANN-2018-ES

Read more about DIANN-2018-ES
Log in or register to post comments

MEDDOCAN

Read more about MEDDOCAN
Log in or register to post comments

CAPITEL-NER

Read more about CAPITEL-NER
Log in or register to post comments

SpRadIE

Read more about SpRadIE
Log in or register to post comments

DisTEMIST

Read more about DisTEMIST
Log in or register to post comments

MEDDOPLACE: Location Entity Recognition

NLP topic

(named) entity recognition

Dataset

MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents Place-related Content Extraction

Language

Spanish

Year

2023

MedProcNER: Medical Procedure Recognition

NLP topic

(named) entity recognition

Dataset

MedProcNER/ProcTEMIST corpus 2023

Language

Spanish

Year

2023

Monolingual NER Spanish

NLP topic

(named) entity recognition

Dataset

MultiCoNER v2 ES

Language

Spanish

English

Year

2023

MedProcNER/ProcTEMIST corpus 2023

Read more about MedProcNER/ProcTEMIST corpus 2023
Log in or register to post comments

Dataset of 1,000 clinical case reports manually annotated by multiple clinical experts with clinical procedures. The case reports were selected by clinical experts and belong to various medical specialties including, amongst others, oncology, odontology, urology, and psychiatry. They are the same text documents that were used for the corpus and shared task on diseases DisTEMIST, building towards a collection of fully-annotated texts for clinical concept recognition and normalization. In addition to the text annotations, the mentions in the corpus have been normalized to SNOMED CT.

MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents Place-related Content Extraction

The MEDDOPLACE Gold Standard corpus is a collection of 1,000 clinical case reports in Spanish from various medical specialties such as psychiatry, neurology, travel medicine, infectious diseases, cardiology, occupational medicine and oncology. The corpus is annotated on the one hand with places and locations and on the other hand location classes of clinical relevance: (a) birthplace, (b) residence, (c) movement, and (d) healthcare attention.

MultiCoNER v2 ES

Read more about MultiCoNER v2 ES
Log in or register to post comments

MULTICONER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and codemixing subsets. This dataset is designed to represent contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities like movie titles, and long-tail entity distributions.