By Nicholas Brown – Follow me on X.
You can instruct Google to stop indexing certain pages on your website using multiple methods. The methods provided herein will work for other search engines as well.
You can block search crawlers using a ‘noindex’ meta tag, a ‘noindex’ HTTP response header, or add the ‘disallow’ directive for each page to your robots.txt file. This feature is useful to keep test pages and other content that isn’t intended for the public from cluttering the search results and dragging down your website’s rank.
The following examples are permanent. If you use Google Search Console to request deindexing, it will be reindexed in six months or less.
How To Use The ‘noindex’ Meta Tag To Block Search Crawlers From A Page
To block search indexing on a per-page basis, use the ‘noindex’ meta tag as shown in the example below. This tag is useful if the page you don’t want indexed is under a subdomain, or a different domain altogether. Place the tag in the head of the HTML page.
<meta name="robots" content="noindex">
The code above will instruct all search engines that support the ‘noindex’ tag to stop indexing that page. If you want to apply that tag to Google specifically, then change the name field ‘robots’ to ‘googlebot’.
How To Block A Large Number Of Pages From Being Indexed Using Robots.txt
You can instruct search engines not to index a list of pages more quickly by adding the ‘Disallow’ directive in the ‘robots.txt’ file as shown below. Just add the paths to all the pages you want removed from the search results as shown below. You may also need this option if you use WordPress or another content management system (CMS) that generates HTML dynamically.
User-agent: *
Disallow: /pathtoblock
Disallow: /secondpagetoblock
Disallow:/thirdpagetoblock
‘User-agent: *’ refers to all search crawlers., not just Google. If you want to apply it to a specific crawler, then replace the asterisk with the name of the crawler. For example, if you want to block the ChatGPT crawler GPTBot from accessing your entire website, use the following code:
User-agent: GPTBot
Disallow: /
The ‘/’ on the ‘Disallow’ line means root, or all pages under that domain. It may have to be added to subdomains (if any) separately in their own robots.txt files.
Learn more about Google’s crawlers and how they work.
Further Reading
How To Create A Sitemap And Submit It
How To Send A POST Request With JavaScript
How To Set Meta Tags Using JavaScript