Introducing GPTZero's Multilingual AI Detection
Con nuestra última actualización, estamos ampliando nuestro alcance para abarcar francés y español, con el objetivo de brindar capacidades de detección confiables en diferentes dominios lingüísticos.
Avec notre dernière mise à jour, nous étendons notre portée au français et à l'espagnol, dans le but de fournir des capacités de détection fiables dans différents domaines linguistiques.
With our latest update, we're expanding our reach to encompass French and Spanish, aiming to provide reliable detection capabilities across different linguistic domains.
This marks a step towards covering a broader spectrum of languages, reflecting our mission to offer robust detection across diverse linguistic domains, emphasizing our commitment to transparency in machine learning and our vision of an AI-driven future that benefits all.
“We're proud to unveil our latest innovation: multi-lingual detection,” says Alex Adam, one of GPTZero’s Machine Learning Engineers. “As AI systems become adept at understanding and processing diverse languages, our mission is to promote greater accessibility and engagement for all.”
Recognizing the growing importance of multilingualism, we've expanded our reach to accommodate the increasing volume of submissions in Spanish, which accounted for nearly 10% of all recent submissions to GPTZero. It's essential for AI detection models not to adopt a one-size-fits-all approach but rather to be tailored to specific linguistic contexts. We're excited to announce that our journey in language diversity is just beginning, with plans to include more languages in the near future.
Performance Metrics
To ensure the efficacy of our multilingual model, we rigorously tested it on over ten thousand documents in French and Spanish, spanning various genres such as news articles, encyclopedia entries, essays, and open-domain Q&A. Our model's predictive capabilities are showcased in Figures 1 and 2, illustrating its ability to distinguish between AI-generated and human-written content across both languages. An ideal model predicts most AI documents (blue) with a high score, and most human documents (orange) with a low score.
Figure 1: Output distribution of our multilingual model on French data
Figure 2: Output distribution of our multilingual model on Spanish data
Table 1: Performance metrics of our multilingual model
Our model demonstrates impressive accuracy rates while maintaining an acceptable false positive rate, as highlighted in Table 1. These results underscore the effectiveness of our multilingual approach and represent a significant improvement over our previous English-centric model. We're excited to extend these benefits to our non-English-speaking users, enhancing their experience with GPTZero's detection capabilities.
Beyond expanding our coverage of documents that can be accurately scanned, exploring text from other languages can also help us identify common patterns that differentiate LLM–generated text from human text. This supports our goal of being not only accurate but also interpretable. LLMs learn in a way that results in a sort of average writing style of all the documents encountered during training. Given the disparate availability of text in different languages, how tone and vocabulary complexity varies when generating multilingual text is still being explored.
Multilingual detection challenges: While we celebrate our achievements, we're also mindful of the challenges inherent in multilingual text understanding. Limited data availability in certain languages poses unique obstacles, requiring innovative solutions to ensure robust detection across diverse linguistic landscapes. Despite these challenges, we remain dedicated to overcoming barriers and delivering reliable detection capabilities for all languages supported by GPTZero.
How to use multilingual detection
Head over to the GPTZero dashboard, copy & paste your text or upload your file and run your scans!