The ClinAIS corpus is a randomly-selected subset of the background CodiEsp corpus, consisting of 1038 distinct clinical notes annotated with seven types of medical sections of the notes.
Language(s)
Spanish
Dataset description link
Year
2023
Domain
Health
Text types
Clinical notes
Data access
Registration
Publication
I. de la Iglesia, M. Vivó, P. Chocrón, G. de Maeztu, K. Gojenola, A. Atutxa, An Open Source Corpus and Automatic Tool for Section Identification in Spanish Health Records, Journal of Biomedical Informatics 145 (2023) 104461
Publication link
NLP Topic
Number of units
1038
Type of units
Documents
Documents
1038
Training set size
781
Test set size
130
Development set size
127
Size - additional information
sections of clinical notes