The dataset is a collection of tweets in Spanish where the words that express puns are annotated. A pun is a form of wordplay in which a word or phrase evokes the meaning of another word or phrase with a similar or identical pronunciation.
Language(s)
Spanish
English
Year
2023
Domain
Social
Text types
Tweets
Annotations
binary label indicating whether the text contains a pun, words that express the pun
Format
json
Data access
Registration
Data link
Publication
Ermakova, L., Miller, T., Bosser, AG., Palma Preciado, V.M., Sidorov, G., Jatowt, A. (2023). Overview of JOKER – CLEF-2023 Track on Automatic Wordplay Analysis. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_26
Publication link
NLP Topic
Number of units
4235
Type of units
Tweets
Training set size
1994
Test set size
2241