GPTZero is Officially The Most Accurate Commercial AI Detector

GPTZero confirms its title as the most accurate commercial AI detector, outperforming competitors on the massive independent RAID benchmark.

Alex Adam, Alex Cui, Edward Tian

Oct 15, 2025 · 3 min read

Fact checked

The GPTZero team is proud to announce that our detector was benchmarked on RAID (a comprehensive, independent, 3rd party dataset), and was shown to be the most accurate AI detector in North America. Our AI checker is able to detect 95.7% of AI texts while only incorrectly predicting 1% of human texts as AI, an accuracy that jumps to over 99% when filtering out discontinued LLMs like GPT3.5. RAID is composed of news articles, reviews, social media posts, and even books, making it a reliable, and holistic way of comparing detection models. These results are consistent with our long history of outperforming competitors including Copyleaks, Originality, and Pangram on our own public benchmarks.

What is RAID? RAID is a comprehensive benchmark created to evaluate AI detectors on 672,000 texts across:

11 Domains (including news articles, reviews, and social media posts)
12 Adversarial attacks (including swapping synonyms, misspelling, and paraphrasing)
12 LLMs (both closed and open source)
4 Decoding strategies (ways of sampling texts from a given model)

The scale and variety of this benchmark makes it ideal for comparing detection models in an unprecedented, holistic way. Since we do not prioritize some of the domains and LLMs from RAID when training our detector, high accuracy on these texts speaks to the generalizability of our approach. Indeed, the release of Claude Sonnet 4.5 did not require us to update our detector since our accuracy on this model is already very high (see results here).

RAID uses 3 different metrics for evaluating models: AUROC, TPR@FPR=1%, and TPR@FPR=5%. Most important for us is TPR@FPR=1% which measures the percentage of AI texts that are detected while misclassifying just 1% of human texts as AI. Table 1 (results on texts without adversarial attacks) below shows that GPTZero is well-ahead of all other competing methods other than SpeedAI, which is focused on serving AI detection in China. With adversarial attacks, every method suffers a decrease in TPR@FPR=1%, but GPTZero maintains its rank, highlighting how robust our detector is as shown in Table 2.

Most notable of all is that we achieve our remarkably high accuracy while also being able to say that GPTZero is the most interpretable AI detector available. This is due to our Advanced Scan and Natural Language Explanations features. Such nuance cannot be captured by benchmark metrics, but is nevertheless important to users who want to understand how a given prediction was made.

Additional nuance not captured by RAID, or many other AI detection benchmarks, is that it considers only fully human or fully AI texts. GPTZero is instead capable of detecting far more complex human/AI editing interactions like human texts that are polished by an LLM, interleaving of human and AI text, as well as AI text that has been modified by a dedicated bypasser service. Our users value these fine-grained classifications and the ability to explain our detector’s predictions, and reflect what we believe is important for AI detection as users become more sophisticated in their use of AI.

In the meantime, we continue to upgrade our detector, expanding the set of LLMs and human writing styles it is able to capture, on our mission to making GPTZero the most accurate AI detector in the world.

Written by Alex Adam