PoliticES 2023

The dataset is an extension of the PoliCorpus 2020 dataset and the corpus used for the PoliticES 2022 shared task. It was collected from 2020 and 2022 from the Twitter accounts of politicians, political journalists, and celebrities in Spain using the UMUCorpusClassifier. These users were selected because their political affiliation can be guessed according to the party to which politicians belong, the editorial line of the newspapers where journalists write, or which kind of political parties the celebrities support. The politicians accounts were selected from: (1) members of the government of Spain, (2) members of the Congress and Senate of Spain, (3) mayors of some important cities in Spain, (4) presidents of the autonomous communities, (5) former politicians, and (6) collaborators affiliated with political parties. Journalists were selected from different Spanish news media, such as ABC, El País, El Diario, El Mundo or La Razón among others. The tweets that belong to each cluster are selected favoring the diversity, including texts from different dates and topics. Each cluster is labeled with with self-assigned gender (male, female), profession (celebrity, politician, journalist), and political spectrum on two axes: binary (left, right) and multiclass (left, moderate left, moderate right, right). The final dataset consists of 2797 clusters of 80 tweets each. 

Language(s)
Spanish
Year
2023
Domain
Social
Politics
Text types
Tweets
Annotations
self-assigned gender (male, female), profession (celebrity, politician, journalist), and political spectrum on two axes: binary (left, right) and multiclass (left, moderate left, moderate right, right)
Data access
Registration

Publication
José Antonio Garcia-Díaz, Salud María Jiménez-Zafra, María-Teresa Martín-Valdivia, Francisco García-Sánchez, Luis Alfonso Ureña-López, Rafael Valencia-García (2023) Overview of PoliticES at IberLEF 2023: Political Ideology Detection in Spanish Texts Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 409-416.
NLP Topic
Number of units
2797
Type of units
Tweets
Training set size
2250
Test set size
547

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.