DIANN-2018-EN

The corpus is a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain collected between 2017 and 2018. It is divided into two disjoined parts: training set (80%) and test set (20%). It is annotated with disabilities and negations and their scope.

Language(s)
English
Year
2018
Domain
Health
Text types
Abstracts scientific articles
Annotations
negation trigger, scope negation, disability
Format
xml
Data access
Public

Publication
Hermenegildo Fabregat, Juan Martínez-Romo, Lourdes Araujo (2018) Overview of the DIANN Task: Disability Annotation Task. Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018).
License
CC-BY-4.0
Number of units
500
Type of units
Documents
Tokens
89325
Sentences
6091
Documents
500
Training set size
400 abstracts
Test set size
100 abstracts

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.