background image
writer
For Machine Learning

Don’t Train Your Models
on Bad Data

Whether you’re sourcing training data, developing a foundational LLM, or fine tuning on your own data, you need to ensure generative text does not make it into your training set. We can help.

As Seen In

New York Times LogoBBC LogoWashington Post LogoForbes LogoCNN LogoWired LogoNPR LogoWall Street Journal Logo

Trusted by the industry

“We're using GPTZero to augment our in-house signals to ensure data we create is free of any third-party LLM usage. We trust it so much that anything deemed by GPTZero to be LLM-generated is blocked from submission, and we retroactively use its scores to better tune our in-house ML quality signals and cheat-detection models.”

- Senior Product Manager, at Unicorn Data Labeler

Identify AI-written content across the internet

We developed the most comprehensive and accurate AI checker solution for Machine Learning engineers, which we use to filter data sets we source

GPTZero Dashboard

Improve interpretability with our premium API

Rather than produce a binary evaluation of whether text is generated by an LLM, our premium model combines recent breakthroughs in AI detection to offer granular details and interpretability only available on GPTZero

Fast and easy to implement

Our public REST API is the same one that powers our web application, so you know we’re eating our own dogfood. Our response time average 0.4s for a 700 word document.

curl --request POST
--url https://api.gptzero.me/v2/predict/text 
--header 'Accept: application/json'
--header 'Content-Type: application/json'
--header 'x-api-key: '
--data '{
"document": "string",
"version": "string"
}'

Don’t worry about security with our on-prem solution

We understand some customers prefer to keep their data on premisis. Run our API on your own infrastructure so your data never leaves your servers.

GPTZero reviews

“The best AI checker"according to our customers and community

FAQs about GPTZero

Everything you need to know about GPTZero and our chat gpt detector. Can’t find an answer? You can talk to our customer service team.

What is GPTZero?

GPTZero is the leading AI detector for checking whether a document was written by a large language model such as ChatGPT. GPTZero detects AI on sentence, paragraph, and document level. Our model was trained on a large, diverse corpus of human-written and AI-generated text, with a focus on English prose. To date, GPTZero has served over 2.5 million users around the world, and works with over 100 organizations in education, hiring, publishing, legal, and more.

How do I use GPTZero?

Simply paste in the text you want to check, or upload your file, and we'll return an overall detection for your document, as well as sentence-by-sentence highlighting of sentences where we've detected AI. Unlike other detectors, we help you interpret the results with a description of the result, instead of just returning a number.

To get the power of our AI detector for larger texts, or a batch of files, sign up for a free account on our Dashboard.

If you want to run the AI detector as your browse, you can download our Chrome Extension, Origin, which allows you to scan the entire page in one click.

When should I use GPTZero?

Our users have seen the use of AI-generated text proliferate into education, certification, hiring and recruitment, social writing platforms, disinformation, and beyond. We've created GPTZero as a tool to highlight the possible use of AI in writing text. In particular, we focus on classifying AI use in prose.

Overall, our classifier is intended to be used to flag situations in which a conversation can be started (for example, between educators and students) to drive further inquiry and spread awareness of the risks of using AI in written work.

Does GPTZero only detect ChatGPT outputs?

No, GPTZero works robustly across a range of AI language models, including but not limited to ChatGPT, GPT-4, GPT-3, GPT-2, LLaMA, and AI services based on those models.

Why GPTZero over other detection models?

  • GPTZero is the most accurate AI detector across use-cases, verified by multiple independent sources, including TechCrunch, which called us the best and most reliable AI detector after testing seven others.
  • GPTZero builds and constantly improves our own technology. In our competitor analysis, we found that not only does GPTZero perform better, some competitor services are actually just forwarding the outputs of free, open-source models without additional training.
  • In contrast to many other models, GPTZero is finetuned for student writing and academic prose. By doing so, we've seen large improvements in accuracies for this use-case.
Lastly, many of our users - especially educators - have told us they trust GPTZero because we have only one mission: provide every human with the tools to detect and safely adopt AI technologies. Unlike many providers who recently released detectors as a side product, this mission will always be our number one priority.

What are the limitations of the classifier?

The nature of AI-generated content is changing constantly. As such, these results should not be used to punish students. We recommend educators to use our behind-the-scene Writing Reports as part of a holistic assessment of student work. There always exist edge cases with both instances where AI is classified as human, and human is classified as AI. Instead, we recommend educators take approaches that give students the opportunity to demonstrate their understanding in a controlled environment and craft assignments that cannot be solved with AI.

The accuracy of our model increases as more text is submitted to the model. As such, the accuracy of the model on the document-level classification will be greater than the accuracy on the paragraph-level, which is greater than the accuracy on the sentence level.

The accuracy of our model also increases for text similar in nature to our dataset. While we train on a highly diverse set of human and AI-generated text, the majority of our dataset is in English prose, written by adults.

Our classifier is not trained to identify AI-generated text after it has been heavily modified after generation (although we estimate this is a minority of the uses for AI-generation at the moment).

Currently, our classifier can sometimes flag other machine-generated or highly procedural text as AI-generated, and as such, should be used on more descriptive portions of text.

What can I do as an educator to reduce the risk of AI misuse?

We believe that the best outcome for educators and students arrives by working together proactively to understand the problem of AI misuse and find strategies that hone in on the human value of education. For example, educators can:
  1. Help students understand the risks of using AI in their work (to learn more, see this article), and value of learning to express themselves. For example, in real-life, real-time collaboration, pitching, and debate, how does your class improve their ability to communicate when AI is not available?
  2. Create an assessment that cannot be answered by Chat GPT or other AI. For example:
    • Ask students to write about personal experiences and how they relate to the text, or reflect on their learning experience in your class.
    • Ask students to critique the default answer given by Chat GPT to your question.
    • Require that students cite real, primary sources of information to back up their specific claims, or ask them to write about recent events.
    • Assess students based on a live discussion with their peers, and use peer assessment tools (such as the one provided by our partner, Peerceptiv).
    • Ask students to complete their assignments in class or in an interactive way, and shift lectures to be take-home.
  3. Ask students to produce multiple drafts of their work that they can revise as peers or through the educator, to help students understand that assignments are meant to teach a learning process.
  4. Ask students to produce work in a medium that is difficult to generate, such as powerpoint presentations, visual displays, videos, or audio recordings.
  5. Set expectations for your students that you will be checking the work through an AI detector like GPTZero, to deter misuse of AI.

I'm an educator who has found AI-generated text by my students. What do I do?

Firstly, at GPTZero, we don't believe that any AI detector is perfect. There always exist edge cases with both instances where AI is classified as human, and human is classified as AI. Nonetheless, we recommend that educators can do the following when they get a positive detection:

  1. Ask students to demonstrate their understanding in a controlled environment, whether that is through an in-person assessment, or through an editor that can track their edit history (for instance, using our Writing Reports through Google Docs). Check out our list of several recommendations on types of assignments that are difficult to solve with AI.
  2. Ask the student if they can produce artifacts of their writing process, whether it is drafts, revision histories, or brainstorming notes. For example, if the editor they used to write the text has an edit history (such as Google Docs), and it was typed out with several edits over a reasonable period of time, it is likely the student work is authentic. You can use GPTZero's Writing Reports to replay the student's writing process, and view signals that indicate the authenticity of the work.
  3. See if there is a history of AI-generated text in the student's work. We recommend looking for a long-term pattern of AI use, as opposed to a single instance, in order to determine whether the student is using AI.

What data did you train your model on?

Our model is trained on millions of documents spanning various domains of writing including creating writing, scientific writing, blogs, news articles, and more. We test our models on a never-before-seen set of human and AI articles from a section of our large-scale dataset, in addition to a smaller set of challenging articles that are outside its training distribution.

How do I use and interpret the results from your API?

To see the full schema and try examples yourself, check out our API documentation.

Our API returns a document_classification field which indicates the most likely classification of the document. The possible values are HUMAN_ONLY, MIXED, and AI_ONLY. We also provide a probability for each classification, which is returned in the class_probabilities field. The keys for this field are human, ai or mixed. To get the probability for the most likely classification, the predicted_class field can be used. The class probability corresponding to the predicted class can be interpreted as the chance that the detector is correct in its classification. I.e. 90% means that 90% of the time on similar documents our detector is correct in the prediction it makes. Lastly, each prediction comes with a confidence_category field, which can be high, medium, or low. Confidence categories are tuned such that when the confidence_categoryfield is high 99.1% of human articles are classified as human, and 98.4% of AI articles are classified as AI.

Additionally, we highlight sentences that been detected to be written by AI. API users can access this highlighting through the highlight_sentence_for_ai field. The sentence-level classification should not be solely used to indicate that an essay contains AI (such as ChatGPT plagiarism). Rather, when a document gets a MIXED or AI_ONLY classification, the highlighted sentence will indicate where in the document we believe this occurred.

Are you storing data from API calls?

No. We do not store or collect the documents passed into any calls to our API. We wanted to be overly cautious on the side of storing data from any organizations using our API.

However, we do store inputs from calls made from our dashboard. This data is only used in aggregate by GPTZero to further improve the service for our users. You can refer to our privacy policy for more details.

How do I cite GPTZero for an academic paper?

You can use the following bibtex citation:

@misc{tian2024gptzero,
    publisher = {GPTZero},
    url       = {https://gptzero.me},
    year      = {2024},
    author    = {Tian, Edward and Cui, Alexander},
    title     = {GPTZero: Towards detection of AI-generated text using zero-shot and supervised methods}
}

What is GPTZero's source finder?

GPTZero source finder is to detect potentially misleading claims in text and give recommendations for sources that support or contradict those claims. Our tool allows you to find any arguments or “claims” in a document that may require more scrutiny, and then links to helpful sources to dive deeper into your analysis and provide helpful context. You can pull these into your own research, or share your results to improve someone else's.

How do I use GPTZero's source finder?

Simply paste in the text you want to check, or upload your file. Source finder will detect as many checkable, objective claims in your text and match those to sources from online, academic, and publicly available data coming from AI-powered search engines. You may find sources that would directly support or contradict these claims. You can cite the relevant snippet from the source, and also citations in MLA, Chicago, APA, Bibtex and IEEE.

While this feature is in beta, there may be changes and improvements made to the feature that may change the functionality upon final release.

Does GPTZero's source finder use AI?

We use AI to detect claims and arguments and pair them with potentially relevant sources. We actively do NOT recommend AI-generated content, due to their unreliability, and filter out sources that are potentially AI-generated.

Does GPTZero's source finder work as a fact checker?

Our tool currently does not take a stance on whether your claims or the claims in the sources cited are true “facts” or false. We only indicate whether there is online evidence that contradicts, debates, or supports a claim. We strive to only include high-quality sources from reputable online, academic, and public sources, but do not endorse the viewpoints in any specific source.

How does GPTZero select the sources in its database?

We allow you to search for claims from a dataset of over 220M scholarly articles, preprints and real-time news using AI next-generation recommendation AI and large language models.