This dataset consists of fact-checks, social media posts and pairings between them. The dataset consists of 205,751 fact-checks in 39 languages and 28,092 social media posts in 27 languages. All the posts were previously reviewed by professional fact-checkers who also assigned appropriate fact-checks to them. There are 31,305 fact-check-to-post pairs, each post is paired with at least one fact-check. 26,774 of these pairs are monolingual and 4,212 are crosslingual. The dataset introduces crosslingual previously fact-checked claim retrieval (PFCR) as a new task.
Idioma(s)
Español
Inglés
Enlace descripción Dataset
Año
2025
Dominio
Social
Tipo Textos
Publicaciones de redes sociales
Anotaciones
social media post-claim paired
Formato
csv
Enlace guía anotaciones
Acceso a datos
Registro
Enlace acceso a datos
Publicación
Matúš Pikuliak, Ivan Srba, Robert Moro, Timo Hromadka, Timotej Smole?, Martin Melišek, Ivan Vykopal, Jakub Simko, Juraj Podroužek, and Maria Bielikova. 2023. Multilingual Previously Fact-Checked Claim Retrieval. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16477–16500, Singapore. Association for Computational Linguistics.
Enlace publicación
NLP Topic
Número de unidades
7581
Tamaño set entrenamiento
6313
Tamaño set evaluación
576
Tamaño set desarrollo
692
- Inicie sesión o registrese para enviar comentarios

