PoliticES 2023 | Portal ODESIA

The dataset is an extension of the PoliCorpus 2020 dataset and the corpus used for the PoliticES 2022 shared task. It was collected from 2020 and 2022 from the Twitter accounts of politicians, political journalists, and celebrities in Spain using the UMUCorpusClassifier. These users were selected because their political affiliation can be guessed according to the party to which politicians belong, the editorial line of the newspapers where journalists write, or which kind of political parties the celebrities support. The politicians accounts were selected from: (1) members of the government of Spain, (2) members of the Congress and Senate of Spain, (3) mayors of some important cities in Spain, (4) presidents of the autonomous communities, (5) former politicians, and (6) collaborators affiliated with political parties. Journalists were selected from different Spanish news media, such as ABC, El País, El Diario, El Mundo or La Razón among others. The tweets that belong to each cluster are selected favoring the diversity, including texts from different dates and topics. Each cluster is labeled with with self-assigned gender (male, female), profession (celebrity, politician, journalist), and political spectrum on two axes: binary (left, right) and multiclass (left, moderate left, moderate right, right). The final dataset consists of 2797 clusters of 80 tweets each.

Language(s)

Spanish

Year

2023

Domain

Social

Politics

Text types

Tweets

Annotations

self-assigned gender (male, female), profession (celebrity, politician, journalist), and political spectrum on two axes: binary (left, right) and multiclass (left, moderate left, moderate right, right)

Data access

Registration

Data link

https://codalab.lisn.upsaclay.fr/competitions/10173

Publication

José Antonio Garcia-Díaz, Salud María Jiménez-Zafra, María-Teresa Martín-Valdivia, Francisco García-Sánchez, Luis Alfonso Ureña-López, Rafael Valencia-García (2023) Overview of PoliticES at IberLEF 2023: Political Ideology Detection in Spanish Texts Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 409-416.

Publication link

http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6570/3970

NLP Topic

profiling

Number of units

2797

Type of units

Tweets

Training set size

2250

Test set size

547