Ways to by-pass AI Detection?

Recently, several applications on the internet have self-proclaimed themselves as tools to by-pass AI detectors. In this study, we explore the validity of these claims, and test one of the 'by-passers' across multiple AI detectors, including GPTZero.

Summary of Key Findings:

  • The majority of these services claiming undetectability are applying paraphrasing tools such as the open-source T5 model for re-writing text. Research has demonstrated that paraphrasing detection is indeed possible; GPTZero is already implementing detection for specific paraphrasing models. Additionally, our collaborators at the Princeton NLP lab are releasing what will be the most comprehensive modern benchmark for paraphrasing detection research.
  • Many of these apps apply simple get arounds such as injection attacks (adding random spaces to text). In testing, these by-passers we're successful in fooling a majority of AI detectors, including Originality, ZeroGPT, Copyleaks, Turn-it-in, and OpenAI's classifier.
  • The by-passer also worked for GPTZero, but unlike with other services, GPTZero maintains an updated 'greylist' of by-passer methods, and patched the by-pass within days.

In short, we'll guarantee an expedient and effectively response to every by-passer except one — rewriting the text in one's own words, as a human.

Case Study: By-Passer Zero.

One of the most popular AI detector by-passers was one popularized on Tiktok, literally called the ‘GPTZero By-passer’ program. The program modified essay text by replacing key letters with Cyrillic, that looks alike to humans, but completely different to the machine. It received significant traction among students on Tiktok with over 100K views and multiple individuals in the comments section reporting using the by-passer.

We decided to investigate the GPTZero by-passer program ourselves. First, we generated a few paragraphs of AI text using ChatGPT.

Asking ChatGPT to generate text from a given prompt. ChatGPT likely responded regarding GPT-3 because it’s training data is not recent enough to know of GPTZero.

After generating an initial text with ChatGPT, we ran the GPTZero by-passer program. The outputted text from the GPTZero by-passer is a modified version of the inputted text that contains numerous irregularly characters.

We then copied and pasted the modified text into GPTZero.

In our experiment, the GPTZero by-passer exploit was initially successful in duping GPTZero.

On Feb 2, as part of one of our regular model updates, the by-passer exploit was patched. The update inundated the by-passer’s methodology in modifying character tokens. Likely as a response to our patch, the creators of the ‘GPTZero by-passer’ video deleted their tutorial from Tiktok on Feb 4.

Even with the by-passer modification, GPTZero now detects the above text as AI generated. In the example, GPTZero also accurately highlighted exactly which portions of the essay were AI generated.

A Baseline

As a baseline comparison for our experiment, we inputted the same text into OpenAI’s AI detection classifier released on January 31. Without the by-passer modification, the OpenAI detector detects the AI generated text as ‘possibly AI generated’.

In comparison, we inputted the by-passer modified text into the OpenAI detector. The OpenAI detector as a result is ‘unclear’ whether the text is AI or human generated. In this example, the OpenAI detector, just like GPTZero before our February patch and model update, is duped by the GPTZero by-passer program.

An Arms Race?

Since GPTZero launched on January 2nd, our team has been constantly asked whether AI detection is entering an arms race with generative AI technology.

Technologically — the answer is TBD.

Un-technically and practically speaking. Absolutely. 100%. Rather than AI detection vs AI advancement, however, the arms race will look much more like a race to respond to human-made exploits. Whether Tiktok stars or far more organized adversaries like Russian bot farms, humans will absolutely and constantly develop new exploits against AI detection models.

As a result, it will not be enough to train and release a classifier by itself — for AI detection to be successful in practice, it will require humans, iterating, monitoring and constantly adapting detection models and responding to exploits from other humans.

An Approach

In the past month, multiple organizations have trained AI detection classifiers.

Our approach at GPTZero is to migrate away from training classifiers. Instead, we’re building a pipeline to constantly improve our model from training data, and a feedback loop to constantly iterate on our product from teacher suggestions, and novel (sometimes adversarial) use-cases. We’re also excited to be entering into collaborations this week with some of the largest Learning Management Systems, to build the best AI detection solution for teachers.

Additional takeaways from this case-study.

  • Training a detection model and testing in the lab is completely different from applying one in the real-world. Adversarial use-cases emerge. In the real world, you need a team to constantly monitor and detect against new exploits.
  • Training a classifier is not enough, especially not for the educational use-case. (We figured this one out early, and migrated to GPTZeroX). Turns out for the education use case, you actually need a team constantly iterating and talking to teachers daily, to build a product that works for educators.