SQUAD-SQAC 2024 ES

SQUAD/SQAC 2024 is an extension of the datasets  SQUAD v1.1. (Stanford Question Answering Corpus) (Rajpurkar et al., 2016) for English and SQAC (Spanish Question Answering Corpus) (Gutiérrez-Fandiño et al., 2021)  for Spanish.  The dataset contains academic news from CSIC (Centro Superior de Investigaciones Científicas) for Spanish and  Cambridge University for English, with questions and extractive answers. The news belong to different domains and are usually short, between 712 y 2760 tokens in English and  514 and 2818 tokens in Spanish. Each text has at least 10 questions with their answers.  The text are addressed to the general public, so the language is not specialized. SQUAD/SQAC 2024 ES is the dataset in Spanish.

Language(s)
Spanish
Year
2024
Domain
Diverse
Text types
Scientific papers
Annotations
Question-extractive answer pairs
Format
json

NLP Topic
Number of units
110
Type of units
News
Tokens
962502
Documents
110
Test set size
110

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.