DIANN-2018-EN | Portal ODESIA

The corpus is a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain collected between 2017 and 2018. It is divided into two disjoined parts: training set (80%) and test set (20%). It is annotated with disabilities and negations and their scope.

Language(s)

English

Dataset description link

http://ceur-ws.org/Vol-2150/overview-diann-task.pdf

Year

2018

Domain

Health

Text types

Abstracts scientific articles

Annotations

negation trigger, scope negation, disability

Format

xml

Annotation guide link

http://ceur-ws.org/Vol-2150/overview-diann-task.pdf

Data access

Public

Data link

https://github.com/gildofabregat/DIANN-IBEREVAL-2018/tree/master/DIANN_CORPUS

Publication

Hermenegildo Fabregat, Juan Martínez-Romo, Lourdes Araujo (2018) Overview of the DIANN Task: Disability Annotation Task. Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018).

Publication link

http://ceur-ws.org/Vol-2150/overview-diann-task.pdf

License

CC-BY-4.0

NLP Topic

information extraction

Number of units

500

Type of units

Documents

Tokens

89325