This task aims to boost research on the detection of text generated automatically by text generation models. Participants must develop models that exploit clues about linguistic form and meaning to distinguish automatically generated text from human text. This subtask consists in distinguishing between human and generated text. It is framed as a binary classification task of human text (Hum) and MGT (Gen), where text from three domains is included in the training set, and submissions are evaluated in two unseen ones.
Publication
Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso (2023) Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains. Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 275-288.
Competition
Language
Spanish
NLP topic
Abstract task
Dataset
Year
2023
Publication link
Ranking metric
Macro F1
Task results
System | Precision | Recall | F1 | CEM | Accuracy | MacroPrecision | MacroRecall | MacroF1 Sort ascending | RMSE | MicroPrecision | MicroRecall | MicroF1 | MAE | MAP | UAS | LAS | MLAS | BLEX | Pearson correlation | Spearman correlation | MeasureC | BERTScore | EMR | Exact Match | F0.5 | Hierarchical F | ICM | MeasureC | Propensity F | Reliability | Sensitivity | Sentiment Graph F1 | WAC | b2 | erde30 | sent | weighted f1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TALN-UPF | 0.7077 | ||||||||||||||||||||||||||||||||||||
Ling UCM | 0.7060 | ||||||||||||||||||||||||||||||||||||
Drocks | 0.6537 | ||||||||||||||||||||||||||||||||||||
GLPSI | 0.6390 | ||||||||||||||||||||||||||||||||||||
turing_testers | 0.6277 | ||||||||||||||||||||||||||||||||||||
bucharest | 0.5649 | ||||||||||||||||||||||||||||||||||||
ANLP | 0.5138 | ||||||||||||||||||||||||||||||||||||
UAEMex | 0.3517 | ||||||||||||||||||||||||||||||||||||
LKE_BUAP | 0.3160 |