Multilingual Tweet Intimacy Analysis is a task to predict the intimacy of tweets in different languages. The task focuses on on perceived intimacy by asking annotators to give their subjective judgment of tweet intimacy. Each annotator is asked to answer “How intimate do you think the given tweet is?” using a 1-5 likert scale. Tweets are annotated in 10 languages. The training data contains labeled intimacy for six languages: English, French, Spanish, Italian, ortuguese, and Chinese. To encourage new studies on understanding intimacy in language four other languages are included without training data (Dutch, Hindi, Korean, and Arabic). The participants are asked to build models that can predict tweet intimacy from 1 (not intimate at all) to 5 (very intimate).
Publication
Jiaxin Pei, Vítor Silva, Maarten Bos, Yozen Liu, Leonardo Neves, David Jurgens, and Francesco Barbieri. 2023. SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 2235–2246, Toronto, Canada. Association for Computational Linguistics.
Language
Spanish
English
NLP topic
Abstract task
Dataset
Year
2023
Ranking metric
Pearson correlation