Home > Tech

OpenAI launches webcrawler GPTBot, and instructions on how to block it

Websites can choose to opt out.

By

Meera Navlakha

on August 8, 2023

Credit: akub Porzycki/NurPhoto via Getty Images.

OpenAI has launched a web crawler to improve artificial intelligence models like GPT-4.

Called GPTBot, the system combs through the Internet to train and enhance AI's capabilities. Using GPTBot has the potential to improve existing AI models when it comes to aspects like accuracy and safety, according to a blog post by OpenAI.

"Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies," reads the post.

Websites can choose to restrict access to the web crawler, however, and prevent GPTBot from accessing their sites, either partially or by opting out entirely. OpenAI said that website operators can disallow the crawler by blocking its IP address or on a site's Robots.txt file.

Mashable Light Speed

Want more out-of-this world tech, space and science stories?

Sign up for Mashable's weekly Light Speed newsletter.

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

SEE ALSO: Google's Bard AI chatbot is vulnerable to use by hackers. So is ChatGPT.

Previously, OpenAI has landed in hot water for how it collects data and for things like copyright infringement and privacy breaches. This past June, the AI platform was sued for "stealing" personal data to train ChatGPT.

Its opt-out functions were only recently implemented, with features like disabling chat history allowing users more control over what personal data can be accessed.

ChatGPT 3.5 and 4 were trained on online data and text dating up to Sept. 2021. There is currently no way to remove content from that dataset.

How to prevent GPTBot from using your website's content

According to OpenAI, you can disallow GPTBot by adding it to your site's Robots.txt, which is essentially a text file that instructs web crawlers on what they can or cannot access from a website.

The code for disallowing GPTBot from your site.

Credit: Screenshot / OpenAI.

You can also customize what parts a web crawler can use, allowing certain pages and disallowing others.

The code for disallowing or allowing GPTBot from your site's pagess.

Credit: Screenshot / OpenAI.

Topics Artificial Intelligence ChatGPT

Meera Navlakha

Culture Reporter

Meera is a Culture Reporter at Mashable, joining the UK team in 2021. She writes about digital culture, mental health, big tech, entertainment, and more. Her work has also been published in The New York Times, Vice, Vogue India, and others.

Recommended For You

OpenAI, Microsoft, Trump admin claim DeepSeek trained AI off stolen data

Did DeepSeek take OpenAI's data…which OpenAI took from all of us?

01/29/2025

By Matt Binder

DeepSeek and OpenAI logos

DeepSeek and OpenAI logos

Elon Musk makes $97 billion bid for OpenAI, gets rejected

OpenAI CEO Sam Altman appears to have rejected the offer and instead made his own bid for Twitter.

02/10/2025

By Matt Binder

Elon Musk and Sam Altman in 2015

Elon Musk and Sam Altman in 2015

DeepSeek could dethrone OpenAI's ChatGPT. Here's why

With an actual open source model, China's AI leader just whupped America's AI leader. Can Sam Altman fight back?

01/28/2025

By Chris Taylor

OpenAI CEO Sam Altman looks dismayed on a TV interview

OpenAI CEO Sam Altman looks dismayed on a TV interview

Elon Musk says he'll stop trying to buy OpenAI if it stays a nonprofit

The Musk v. Altman drama continues.

02/13/2025

By Cecily Mauran

OpenAI responds to criticism of ChatGPT's Studio Ghibli-style images

There's a very specific reason why OpenAI says it's okay.

03/28/2025

By Cecily Mauran

studio ghibli exhibition in singapore showing sculptures of 'my neighbor totoro'

studio ghibli exhibition in singapore showing sculptures of 'my neighbor totoro'

Trending on Mashable

NYT Connections hints today: Clues, answers for April 16, 2025

Everything you need to solve 'Connections' #675

3 hours ago

By Mashable Team

Connections game on a smartphone

Connections game on a smartphone

Wordle today: Answer, hints for April 16, 2025

Here are some tips and tricks to help you find the answer to "Wordle" #1397.

3 hours ago

By Mashable Team

Wordle game on a smartphone

Wordle game on a smartphone

Deep sea craft filmed unprecedented footage of a colossal squid

Never before filmed in its natural environs.

17 hours ago

By Mark Kaufman

An image from the first-ever confirmed footage of a colossal squid.

An image from the first-ever confirmed footage of a colossal squid.

Lego is giving away Grogu models for free to celebrate Star Wars Day. Here’s how to get yours.

Get your hands on an exclusive model of Grogu in a hover pram.

9 hours ago

By Joseph Green

Lego Bricks in child's hands

Lego Bricks in child's hands

NYT Strands hints, answers for April 16

Every hint, nudge and outright answer you need to complete today's NYT Strands puzzle.

6 hours ago

By Mashable Team

A game being played on a smartphone.

A game being played on a smartphone.

The biggest stories of the day delivered to your inbox.

These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up. See you at your inbox!

zproxy.org