The AuTexTification dataset consists of texts written by humans and LLMs in five domains: tweets, reviews, how-to articles, news and legal documents.
Language(s)
Spanish
English
Dataset description link
Year
2023
Domain
General
Legal
News
Data access
Registration
Data link
Publication
Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso (2023) Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains. Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 275-288.
NLP Topic
Number of units
52191
Type of units
Samples of texts
Size - additional information
model generated or not, attributed model