DA-VINCIS 2023

The DA-VINCIS 2023 corpus is an upgrade of the dataset used in the previous edition. It is composed of Twitter data associated with reports of violent incidents in Mexican Spanish. All tweets in this corpus have at least an image associated. The time frame elapsed between the occurrence of the event and when posted on social media, in this case, a lapse of 24 hours was defined as the maximum time for considering a tweet as reporting this kind of events. The following categories of violent incidents are considered: accident, murder, robbery and other.

Language(s)
Spanish (Mexico)
Year
2023
Domain
Social
Text types
Tweets
Data access
Public

Publication
Horacio Jarquín-Vásquez, Delia Irazú Hernández-Farías, Luis Joaquín Arellano, Hugo Jair Escalante, Luis Villaseñor-Pineda, Manuel Montes-y-Gómez, Fernando Sanchez-Vega (2023) Overview of DA-VINCIS at IberLEF 2023: Detection of Aggressive and Violent Incidents from Social Media in Spanish. Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 351-360.
NLP Topic
Number of units
4731
Type of units
Tweets
Training set size
2996
Test set size
1153
Development set size
582
Size - additional information

violent events and their type

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.