Language(s)
Spanish
Year
2021
Domain
Health
Text types
Clinical records from journals
Annotations
occupations, occupation holders, SNOMED-CT codes for occupations
Format
UTF-8 text files with the annotations as separate files in brat standoff format (.ann files)
Annotation guide link
Data access
OpenAccess
Data link
NLP Topic
Number of units
1844
Type of units
Documents
Tokens
1291186
Sentences
58627
Documents
1844
Training set size
1500 docs/49114 sentences/1075655 tokens
Test set size
344 docs/9513 sentences/215531 tokens