The corpus is a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain collected between 2017 and 2018. It is divided into two disjoined parts: training set (80%) and test set (20%). It is annotated with disabilities and negations and their scope.
Language(s)
English
Dataset description link
Year
2018
Domain
Health
Text types
Abstracts scientific articles
Annotations
negation trigger, scope negation, disability
Format
xml
Annotation guide link
Data access
Public
Publication
Hermenegildo Fabregat, Juan MartÃnez-Romo, Lourdes Araujo (2018) Overview of the DIANN Task: Disability Annotation Task. Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018).
Publication link
License
CC-BY-4.0
NLP Topic
Number of units
500
Type of units
Documents
Tokens
89325
Sentences
6091
Documents
500
Training set size
400 abstracts
Test set size
100 abstracts