MultiCoNER v2 ES

MULTICONER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and codemixing subsets. This dataset is designed to represent  contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities like movie titles, and long-tail entity distributions. 

Language(s)
Spanish
Year
2023
Domain
General
Text types
Wiki sentences
Questions
Search queries
Data access
Public

Publication
"Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko (2022) MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity Recognition. Proceedings of the 29th International Conference on Computational Linguistics, pages 3798–3809
October 12–17, 2022"
License
CC-BY-4.0
Number of units
264207
Type of units
Sentences
Sentences
264207
Training set size
16453
Test set size
246900
Development set size
854
Size - additional information

named entities

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.