MLDOC 2018: Document classification

Monolingual document classification task performed on the Spanish dataset of the Multilingual Document Classification Corpus (MLDoc) (Schwenk and Li, 2018), a cross-lingual document classification dataset covering 8 languages. The corpus consists of 14,458 news articles from Reuters classified in four categories: Corporate/Industrial, Economics, Government/Social and Markets. The task consists in classifying each document in one of the four classes.

Publication

Holger Schwenk and Xian Li. 2018. A Corpus for Multilingual Document Classification in Eight Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).

Language

Spanish

URL Task

https://github.com/facebookresearch/MLDoc

NLP topic

text classification

Abstract task

Classification

Dataset

MLDoc-ES

Year

2018

Publication link

https://aclanthology.org/L18-…

Ranking metric

F1

Task results

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.