Michael Gearon

Should we block AI scraper bots?

Michael Gearon

The BBC said last week that they are blocking AI software OpenAI web crawler from scrapping it’s content. It follows other organisations like Reuters, Getty Images and other content providers. Rhodri Talfan Davies who is the director of nations at the BBC said:

We do not believe the current ‘scraping’ of BBC data without our permission . . . to train ‘gen AI’ models is in the public interest and we want to agree a more structured and sustainable approach with technology companies.

So how do we prevent OpenAI and other AI software from scrapping our content? OpenAI said that if you want to discourage it’s GPTBot (this is what the bot is called that crawls websites) then you have to add this to your robots.txt file:

User-agent: GPTBot
Disallow: /

What about other AI scraping website bots?

Although ChatGPT is the most well-known AI company right now, other companies like Google and Facebook are using bots to scrap content from the web. If you want to try and prevent all bots then you have to add these to your robots.txt file:

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-Agent: FacebookBot
Disallow: /

User-agent: Amazonbot
Disallow: /

But why should we block them?

I’m not trying to convince you should go and block all of these bots from scraping your content today. Instead it raises the question of should all of these bot have the ability, without your permission, scrap your content and not reference you as the source? Chris Coyier recently blogged said that:

If a huge company sent a robot to your door to ask for a lock of your hair, would you give it to them? If they asked for one square inch of your land, would you sign it over? If they asked you to run on a treadmill for one minute a day for them, would you hop to it? What if they didn’t ask?

Also we must remember that disallowing these bots doesn’t mean they will stop scrapping your content. It will discourage them but there are lots of bots out there and scrapping content is a popular theme at the moment.

Michael Gearon

Written by

Michael Gearon

Senior Interaction Designer and Co-Author to Tiny CSS Projects