MultiParaDetox

MultiParaDetox is a dataset for text detoxification, a style transfer task that converts toxic expressions into a neutral register. It extends the ParaDetox approach to multiple languages, enabling the automatic creation of parallel corpora for detoxification.

Language(s)
Spanish
Ukrainian
Year
2024
Domain
Diverse
Annotations
parallel corpus.
Data access
Public

Publication
Daryna Dementieva, Nikolay Babakov, and Alexander Panchenko. 2024. MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 124–140, Mexico City, Mexico. Association for Computational Linguistics.
License
Openrail++
Number of units
1720
Type of units
Tweets
Training set size
720
Test set size
600
Development set size
400

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.