GPTZero performance on Claude Sonnet 4.5

GPTZero achieved 97% recall on Claude Sonnet 4.5 at 1% false positives, surpassing its initial 95% on GPT-5. Detection remains strong across new models, with recall now over 99%.

Emily Napier
· 1 min read
Send by email

On September 29th, Anthropic released Claude Sonnet 4.5. We benchmarked our performance that day, before including it in our training data, and found that our recall at 1% false positive rate was over 97%. This is even better than our starting performance of 95% recall on GPT5, which has increased to over 99% on our latest benchmark after training on GPT5 generated data. 

Benchmark Details

We generated an evaluation set of 4,500 AI texts from Claude Sonnet 4.5 spanning a variety of domains including essays, academic research, restaurant reviews, news articles, and letters. 

Impact on our AI Detector Performance

We’re seeing that our detector’s performance is robust to the latest LLM releases. There are a few reasons for this, one of which is that Claude Sonnet 4.5 mainly boasts improved performance on coding and agentic tasks, which doesn’t necessarily align with writing more human-like text. Also, our training data covers a wider range of LLM providers and models, creating potential for generalization across model families and sizes.