Datasets

Below is information about Spanish textual data sets created with the goal of solving NLP tasks. In this case, these are collections of texts, generally enriched with annotations.

Filter by

Search by keyword

Domain

NLP topic

Language

Year

SQUAD-SQAC 2024 ES

Diverse
Spanish
Published in 2024
110

Scientific papers
question answering
CT–CWT–23-ES

News
Spanish
Published in 2023
29,984

Tweets
fake news detection
Rest-Mex 2023 Clustering

News
Spanish , Spanish (Mexico)
Published in 2023
114,550

News
topic modeling
GUA-SPA: Guarani Spanish corpus

News
Spanish , Spanish (Paraguay) , Guarani
Published in 2023
1,500

News
code switching detection
AuTexTification 2023

General, Legal, News
Spanish , English
Published in 2023
52,191

text generation
MultiCoNER-ES

Diverse
Spanish
Published in 2022
233,987

Wikipedia Questions Search queries
(named) entity recognition
DETESTS

News
Spanish
Published in 2022
5,629

News comments
hate detection
SQAC

General, News
Spanish
Published in 2022
8,817

Encyclopedia entries News
question answering
FakeDeS

COVID, others
Spanish
Published in 2021
1,633

News
fake news detection
Spanish Fake News Corpus

Diverse
Spanish (Mexico)
Published in 2020
971

News
fake news detection
MLDoc-ES

News
Spanish
Published in 2018
14,458

News
text classification
MLDoc-EN

News
English
Published in 2018
14,458

News
text classification
mSpRL

Diverse
Spanish , English
Published in 2017
1,213

Images, textual descriptions
semantic role labeling

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.