How Many Substack Authors Rely on AI-generated Content?
We investigated 100 popular Substack authors and found how many are likely using AI to help write their posts.
An increasing percentage of the internet’s content is AI-generated. As the leading AI content detector, we’re on a mission to track just how much and where.
We examined 100 top Substack authors and scanned their posts for AI-generated content. Our model detects for most major LLMs, including ChatGPT, Gemini, Bard, Claude and more. This model has a 99% accuracy rate when detecting AI versus human text with a ~1% false positive rate, and more than 96% accuracy at identifying “mixed” samples where there are AI sections interspersed with human content.
Why we looked at Substack
Substack offers writers a platform to monetize their writing differently from previous iterations of blogs and newsletters, primarily through paid subscriptions and gated content. Substack has built a reputation as the next big platform for readers to get high-quality content from top authors and writers in their inbox. And while it’s subjective what makes writing popular or good, Substack’s ethos relies on the unstated principle that humans are compensating other humans for their good (human) writing and ideas.
Here is a breakdown of the GPTZero’s findings:
- 10% of popular Substacks use some form of detectable AI.
- 7% of popular Substacks are significantly relying on AI in more than 1 out of 10 posts. These newsletters focus on a wider range of topics including: sports, financial advice, business, etc.
- Curation seems to help, but without a means of independent detection, AI-driven content may still gain hundreds of thousands of subscribers.
How did we choose which authors to investigate?
We ran our AI detector on the top Substack writers by reported subscriber count and some of the top paid subscriber Substacks, including pulling writers from Substack’s curated “Top” in topic landing pages that you can find in their website footer across most major topics: business, culture, food, tech, sports, politics, health, fashion, science, news, literature, and more.
For each author, we pulled the latest 25-30 posts from their feed (for those who had less than 25 posts, we scraped all they had) and ran them through our AI detection model at scale. For authors with primarily paid posts with truncated content, we subscribed to as many of them as we could afford in order to access their full paid posts.
At least 90% of an author’s posts had to come out as Human Written for the Certified Human badge. (This means an author with 1-2 AI Likely posts may pass if all other posts were human).
We omitted scanning any post that was too short (less than 300 words) because our accuracy rate tends to drop for short posts. We omitted podcast-forward Substacks with primarily summary posts.
Some disclaimers:
- AI detection, unlike plagiarism detection, is based on probability. We only mark a post as AI Likely when our model has moderate to high degree of confidence it is AI generated. When our model has a low degree of confidence, we conservatively mark as human.
- Getting a “Certified Human” badge does not mean we endorse or agree with the author’s content. It doesn’t mean we think the content is “good” or high-quality, only that we think it’s, well, human-grade.
- Our AI detector also does not check if the author’s writing is truthful or accurate (yet). They could be full of lies – but for this investigation, we can only tell if they are AI-generated lies or old-fashioned human lies.
Why does it matter if a writer is using AI-generated content?
Our goal isn’t to pass moral condemnation on writers who use AI. (Our team has definitely thrown a subject line or two into ChatGPT.) Not everyone cares whether their favorite Substack author or newsletter is using AI to help them write.
We do want to raise awareness about the prevalence of AI-generated content, especially as the amount of AI content grows unchecked. (We recently helped researchers find out that a growing number of new Wikipedia articles contain AI-generated content.)
A lack of transparency around AI-generated content poses a problem, not merely for its tendency to be more error-prone factually, but also for its weird voice that tends to overuse certain phrases and sound… artificial. If Substack is meant to curate quality content, unchecked AI blogs pose a pretty large brand risk. (See: the amount of AI slop on Medium.) AI-driven output can push out good-quality human writers, a problem even curation can stem for only so long.
Fundamentally, we believe it is important to humanity’s future to maintain a level of transparency around what content is human and AI. Knowing what is human and what is artificial helps us preserve the truth, something even the most gung-ho AI tech enthusiasts should care about too.
We believe in rewarding real human writers who are putting in the effort to create meaningful, original content.
To get listed, sign up for GPTZero, scan your blog and send us your results.
Full list of Top Substack Writers
Pulled November 2024