How do I use a canonical link

Canonical Tags: A Simple Beginner's Guide

Would you like to learn what canonical tags are and how to use them to avoid unwanted duplicate content issues?

Canonical tags are nothing new. They've been around since 2009 - most of a decade.

Google, Microsoft, and Yahoo have teamed up to develop it. Your goal? Providing website owners with a way to quickly and easily resolve duplicate content issues.

Do they work? Yes, perfect ... but only if you know how to use them!

In this guide you will learn:

What is a canon day?

A canonical tag (rel = “canonical”) is an excerpt from the HTML code that defines the major version for duplicate, almost duplicate, and similar pages. In other words, if you have the same or similar content at different URLs, you can use canonical tags to indicate which version is the major version and thus should be indexed.

What does a canonical day look like?

Canonical tags use a simple and consistent syntax and are placed within the section of a web page:

<link rel=“canonical” href=“https://beispiel.de/beispiel-seite/” />

Below is an explanation of what each part of the code means in plain English:

  1. link rel = “canonical”: The link in this tag is the master (canonical) version of this page.
  2. href = “https://beispiel.de/beispiel-seite/”: The canonical version can be found at this URL.

Why are canonical tags important to SEO?

Google doesn't like duplicate content. This makes it harder for them to make a choice:

  1. Which version of a page to index (they only index one!)
  2. Which version of a page is to be rated for relevant queries.
  3. Whether you want to consolidate “link capital” on one page or split it up into several versions.

Too much duplicate content can also affect the "crawl budget". This means that Google may be wasting time crawling multiple versions of the same page instead of discovering other important content on the site.

The truth about the crawl budget

Forcing Google to waste time crawling duplicate content is, of course, something that should be avoided whenever possible. However, Google states that this will not be a problem for most websites.

When new pages are crawled on the day they are published, the crawl budget is not something webmasters need to focus on. If a website has fewer than a few thousand URLs, in most cases it will be crawled efficiently.

Canonical tags solve all of these problems. They allow you to tell Google which version of a page should be indexed and ranked and where the "link capital" should be consolidated.

If you don't include a canonical URL, Google will take matters into their own hands.

If you don't provide a canonical URL, we'll determine what we think is the best version or URL.

Relying on Google like that is not a good idea. Google might pick a version of your page that you don't really want to canonicalize.

Google states that they usually respect the canonical url you set. But not always. This is because the canonical tags are notices, not directives. As long as they are respected, all signals such as links to the canonical URL should condense.

Using the canonical tag best practices also reduces the risk of Google identifying an unwanted version of the page as canonical.

But I don't have duplicate content, do I?

Since you have likely not published the same posts and pages multiple times, it is likely that your website does not have duplicate content.

But search engines crawl URLs, not websites.

This means that you see example.de/product and example.de/product?color=red as individual pages, even if it is the same website with identical or similar content.

These are called parameterized URLs and are a common cause of duplicate content, especially on e-commerce sites with faceted / filtered navigation.

Brown Bag Clothing sells shirts, for example. This is the url for their main category page:

https://www.bbclothing.co.uk/en-gb/clothing/shirts.html

If you only filter for XL shirts, a parameter will be appended to the URL:

https://www.bbclothing.co.uk/en-gb/clothing/shirts.html? Size = XL

If you then continue filtering for blue shirts, another parameter is added:

https://www.bbclothing.co.uk/en-gb/clothing/shirts.html? Size = XL & color = Blue

In the eyes of Google, these are all separate pages, even if the content differs only slightly.

But it's not just e-commerce sites that fall victim to duplicate content.

Here are some other common causes of duplicate content that apply to all types of websites:

  • Parameterized URLs for search parameters (e.g., example.de?q=such-term)
  • Having parameterized URLs for session IDs (e.g., https://beispiel.de?sessionid=3)
  • Separate printable versions of pages (e.g., example.de/seite other example.de/print/seite)
  • Individual URLs for articles in different categories (e.g., example.de/services/SEO / other example.de/Special offers/ SEO /)
  • Pages for different device types (e.g., example.de other m.example.de)
  •  AMP- and have non-AMP versions of a page (e.g., example.de/seite other amp.example.de/page)
  • Same content for non-wwww and www variants (e.g., http://example.de other http://www.beispiel.de)
  • Same content for non-https and https variants (e.g., http://www.beispiel.de other https://www.beispiel.de)
  • Provide the same content with and without backslashes (e.g., https://beispiel.de/page/ other http://www.beispiel.de/page)
  • Provision of the same content for standard versions of the page such as index pages (e.g., https://www.beispiel.de/, https://www.beispiel.de/index.htm, https://www.beispiel.de/index.html, https://www.beispiel.de/index.php, https://www.beispiel.de/default.htm, Etc.)
  • Provide the same content with and without capital letters. (e.g., https://beispiel.de/page/ other http://www.beispiel.de/Page/)

Proper use of canonical tags is critical in these situations.

In addition, duplicate content across domains is also a challenge. When you syndicate content, it is best if you use a self-referential canonical tag in your article and the syndicated content specifies you as the canonical version with a cross-domain canonical tag.

While this doesn't always prevent the syndicated content from appearing in search results, it does reduce the risk of it being prepended to the original.

If people deliberately chose to syndicate their content, it makes it difficult to identify the originating source. That's why we recommend the use of canonical or blocking. The publishers syndicating can require this. https://t.co/hblGLsD0irpic.twitter.com/yjtx43II8j

- Danny Sullivan (@dannysullivan) September 18, 2019

The basics of implementing the canonical tag

Canonical tags are easy to implement. We're going to discuss four different ways to implement them. But no matter which method you choose, there are five golden rules that you should always remember.

Rule # 1: Use Absolute URLs

Google’s John Mueller explains that best practice is not to use relative paths with the rel = “canonical” link element.

You can use either, but I’d recommend using absolute URLs so that you’re sure they’re interpreted correctly.

- 🍌 John 🍌 (@JohnMu) October 24, 2018

Hence, you should use the following structure:

https://beispiel.de/beispiel-seite/” />

In contrast to this:

/ example-page /” />

Rule # 2: Use lower case urls

Since Google may treat uppercase and lowercase URLs as two different URLs, you should first make sure that you enforce lowercase URLs on your server, and then use lowercase URLs for your canonical tags.

Rule # 3: Use the correct domain version (HTTPS vs. HTTP)

If you've switched to SSL, make sure you don't declare any non-SSL (i.e. HTTP) URLs in your canonical tags. In theory, this can lead to confusion and unexpected results. If you're on a secure domain, make sure you're using the following version of your URL:

https: //example.de/example-page/ ”/>

In contrast to:

http: //example.de/example-page/ ”/>

Rule # 4: Use self-referencing canonical tags

Google's John Mueller says self-referencing canonical tags are recommended, even if they aren't required.

I recommend [using a] self-referencing canonical because it really makes it clear to us which page you want indexed, or what the URL should be when it is indexed.

Even if you have a page, sometimes there are different variations of the URL that can take that page to. For example with parameters at the end, maybe with upper and lower case or www and not-wwww. All of these things can be cleaned up somehow with a rel canonical tag.

John Mueller, Webmaster Trends Analyst Google

In case you're not sure how a self-referential canonical tag works, it's basically a canonical tag on a page that points to itself. For example, if the URL is https://example.de/example-page then a self-referencing canonical statement on this page would be:

<link rel=“canonical” href=“https://beispiel.de/beispiel-seite” />

Most modern popular CMS 'automatically add self-referencing URLs, but you will need to instruct your developer to implement them in code if you are using your own CMS.

Rule # 5: Use one canonical tag per page

If the page has multiple canonical tags then Google will ignore both.

In the case of multiple declarations of rel = canonical, Google will likely ignore all rel = canonical references.

How to implement canonicals

There are five known ways to specify canonical URLs. These are the so-called canonicalization signals:

  1. HTML tag (rel = canonical)
  2. HTTP headers
  3. Sitemap
  4. 301 redirect *
  5. Internal links

You can find the advantages and disadvantages of each method in the official Google documentation.

1. Setting canonicals with rel = “canonical” HTML tags

Using a rel = canonical tag is the easiest and most obvious way to name a canonical URL.

Just paste the following code into the area of ​​any duplicate page:

<link rel=“canonical” href=“https://beispiel.de/canonical-seite/” />

example

Let's say you have an e-commerce website that sells t-shirts. You want that https://deinstore.de/tshirts/black-tshirts/ is the canonical URL, although the content of this page is accessible through other URLs (e.g. https://deinstore.de/angebote/black-tshirts/).

Just add the following canonical tag to all duplicate pages:

<link rel=“canonical” href=“https://deinstore.com/tshirts/black-tshirts/” />

Note that when you use a CMS, you don't have to play around with your site's code. There is an easier way.

Setting canonical tags in WordPress:

Install Yoast SEO and self-referencing canonical tags will be added automatically. To set custom canonical tags, use the “Advanced” section in any post or page.

Setting canonical tags in Shopify:

Shopify adds self-referencing canonical URLs for products and blog posts by default. To set custom canonical URLs, you'll need to edit the template (.liquid) files directly.

This thread has some information on how to do that.

To set canonical tags in Squarespace:

Squarespace adds self-referencing URLs by default. But, as with Shopify, if you want to add a custom canonical URL, you'll have to edit the code right away.

2. Setting canonicals in HTTP headers

For documents like PDFs, there is no way to place canonical tags in the page header because there is no area. In such cases, you'll need to use HTTP headers to set canonical tags. You can also use a canonical tag in HTTP headers on regular web pages.

example

Imagine we are creating a PDF version of this blog post and hosting it in our blog subfolder (ahrefs.com/blog/*).

Our HTTP header for this file could look like this:

HTTP / 1.1 200 OK Content-Type: application / pdf Link: ; rel = "canonical"

Reading recommendation: How to add the canonical tag to HTTP headers

3. Setting canonicals in sitemaps

Google states that non-canonical pages should not be included in sitemaps. Only canonical URLs should be listed. That's because Google sees the pages listed in a sitemap as suggested canonical pages.

However, they will not always select the URLs in the sitemaps as canonical URLs.

We don't guarantee that we will consider the sitemap URLs canonical, but it's an easy way to define canonical URLs for a large website, and sitemaps are a useful way of telling Google what pages you are looking for on your site most important hold.

4. Setting canonicals with 301 redirects

Use a 301 redirect if you want to redirect traffic away from a duplicate URL and to the canonical version.

example

Assuming your page can be reached under these URLs:

  • example.de
  • example.de/index.php
  • example.de/home/

Choose one URL as the canonical and redirect the other URLs to it.

You should do the same for HTTPS / HTTP and www / non-www versions of your site. Choose a canonical version and redirect the others to that version.

For example, the canonical version of ahrefs.com is the HTTPS non-www url (https://ahrefs.com). All of the following urls redirect there:

  • http://ahrefs.com/
  • http://www.ahrefs.com/
  • https://www.ahrefs.com/

Read our complete guide to implementing 301 redirects.

5. Internal links

The way you link from one page to another across your entire website is a canonicalization signal.

Google webmaster trend analyst John Mueller covers the signals used to determine canonical URLs in this #AskGoogleWebmasters video:

https://youtube.com/watch?v=8j_hxBw5B4E

The more consistent you are with all of these signals, the easier it will be for the search engines to determine your preferred canonical URL. As mentioned by John in the video, Google also has a preference for HTTPS over HTTP URLs and for prettier URLs.

Frequent canonical mistakes to avoid

Canonization is a somewhat complex subject. So there is a lot of misunderstanding and misconception about how to properly canonize.

Here are some common mistakes made when implementing canonicals:

Mistake # 1: blocking the canonicalized url via robots.txt

Blocking a URL in the robots.txt prevents Google from crawling it, i. H. Google can't see canonical tags on this page. This in turn prevents “link capital” from being transferred from the non-canonical to the canonical side.

Mistake # 2: Setting the canonical URL to 'noindex'.

Never mix noindex and rel = canonical. These are opposite instructions.

Google will normally prioritize the canonical tag over the "noindex" tag, as John Mueller states here. But it's bad practice nonetheless. If you want to non-index and canonicalize a URL, use a 301 redirect. Otherwise use rel = canonical.

Mistake # 3: Setting a 4XX HTTP status code for the canonical URL

Setting a 4XX HTTP status code for a canonical URL has the same effect as using the "noindex" tag: Google will not be able to see the canonical tag and will not transfer any "link capital" to the canonical version.

Mistake # 4: Canonizing all paginated pages to the root page

Pages should not be canonicalized to the first paginated page in the series. Instead, self-referencing canonicalizations should be used on all paginated pages.

How so? As Google's John Mueller explained on Reddit, this is an improper use of the rel = canonical.

The most important thing to avoid, as this post is about canonicalization, is to use a rel = canonical on page 2 that points to page 1. Page 2 is not equivalent to Page 1, so such a rel = canonical would be wrong.
John Mueller, Webmaster Trends Analyst Google

You should also use rel = prev / next tags for pagination.Google no longer uses these, but Bing continues to use them.

Mistake # 5: Not using canonical tags with hreflang

Hreflang tags are used to specify the language and geographic orientation of a web page.

Google states that when using hreflang one should "specify a canonical page in the same language, or the best possible substitute language if a canonical page for the same language does not exist".

Mistake # 6: Having multiple rel = canonical tags

Several rel = canonical tags are likely to be ignored by Google. In many cases this happens because tags are inserted in different places in a system, e.g. by the CMS, the theme and the plugin (s). Because of this, many plugins have an override option to ensure that they are the only source of canonical tags.

Another area where this could be a problem is with canonicals added with JavaScript. If you didn't provide a canonical URL in the HTML response and then add a rel = canonical tag with JavaScript, then this should be taken into account when Google renders the page. However, if you've given a canonical URL in HTML and swapped the preferred version with JavaScript, you're sending mixed signals to Google.

Error # 7: rel = canonical in the

Rel = canonical should only appear in the of a document. A canonical tag in the of a page is ignored.

Where this can become a problem is in parsing a document. While a page's source code may have the rel = canonical tag in place when the page is actually constructed in a browser or rendered by a search engine, many different things such as unclosed tags, injected JavaScript, or in the < head> cause the to terminate prematurely when rendering. In these cases, a canonical tag can be accidentally inserted into the of a rendered page where it is ignored.

How to find and fix canonicalization issues on your site

It is easy to make mistakes with canonicalization, so it pays to regularly check your site for issues related to canonical tags and fix them as soon as possible.

Ahrefs' site audit tool can be used for this.

https://www.youtube.com/watch?v=LjinWqfGyVE

Site Audit scans your website for over 100 SEO issues, including those related to canonical tags.

Here are the twelve canonical tag-related problems Site Audit can find and how to fix them:

1. Canonical points to 4XX

This warning is triggered when one or more pages are canonicalized to a 4XXURL.

Why is it a problem?

Search engines don't index 4XX pages because they don't work. Therefore, they ignore any canonical tags that point to such pages and often index the wrong (non-canonical) version of the page.

How to fix it

Check the affected pages and replace the dead (4XX) canonical links with links to working (200) pages to be indexed.

2. Canonical points to 5XX

This warning is triggered when one or more pages are canonicalized to a 5XXURL.

Why is it a problem?

5XX HTTP status codes indicate server problems resulting in an inaccessible canonical page. It is unlikely that Google will index inaccessible pages and possibly ignore the canonical page.

How to fix it

Replace any broken canonical URLs with valid URLs. Check for server misconfigurations that the specified canonical URL appears correctly. Note that this can be a temporary problem if the crawl occurred when your website was down for maintenance or your website's server was overloaded.

3. Canonical indicates a redirect

This warning is triggered when one or more pages are canonicalized to a redirected URL.

Why is it a problem?

Canonicals should always refer to the authoritative version of a page. This is not the case with URL redirection. This can cause search engines to misinterpret or ignore the canonical.

How to fix it

Replace the canonical links with direct links to the authoritative version of the page (i.e. one that returns a 200 HTTP status code and does not redirect).

4. Duplicate pages without a canonical

This warning is triggered when there are one or more duplicate or very similar pages that do not specify a canonical version.

Why is it a problem?

Since no canonical version is given, Google will try to identify the most appropriate version that will appear in search results. This may not be the version to be indexed.

How to fix it

Check the groups of duplicates. Select a canonical version to be indexed in search results. Include this as the canonical version across all duplicates (and add a self-referencing canonical tag to the canonical version).

5. Hreflang on non-canonical

This warning is triggered when one or more pages specify a non-canonical url in their hreflang markup.

Why is it a problem?

Links in hreflang tags should always point to the canonical pages. Linking to a non-canonical version of a page from hreflang markups can confuse and mislead search engines.

How to fix it

Replace links in the hreflang markups of the affected pages with their canonicals.

6. Canonical URL has no inbound internal links

This warning is triggered when one or more of the specified canonical URLs have no internal inbound links.

Why is it a problem?

Canonical URLs without internal links are not accessible to website visitors. Instead, somewhere on the website, you will be directed to a non-canonical version of the page.

How to fix it

Replace all internal links to canonical pages with direct links to the canonical.

7. Non-Canonical Page in the Sitemap

This warning is triggered when one or more non-canonical pages are listed in the sitemap.

Why is it a problem?

Google says you shouldn't include non-canonical URLs in your sitemap. This is because they see pages in sitemaps as suggested canonical pages. You should therefore only list pages in sitemaps that are to be indexed.

How to fix it

Remove non-canonical URLs from your sitemap.

8. Non-canonical side specified as canonical side

This warning is triggered when one or more pages specify a canonical URL that is also canonical to another page. This creates a "canonical chain" where side A is canonicalized on side B, which is then canonicalized on side C.

Why is it a problem?

Canonical chains can confuse and mislead search engines. As a result, you can misinterpret or ignore the specified canonical.

How to fix it

Replace non-canonical links in the canonical tags of the affected pages with direct links to the canonical ones. For example, if page A is canonicalized on page B, which is then canonicalized on page C, then replace the canonical link on page A with a link on page C.

9. Open Graph URL inconsistent with Canonical

This warning is triggered when there is a discrepancy between the specified canonical and the Open Graph URL on one or more pages.

Why is it a problem?

If the Open Graph URL does not match the canonical, then a non-canonical version of a page is shared on social networks.

How to fix it

Replace the Open Graph URL on affected pages with the canonical URL. Make sure the two URLs are the same.

10. Canonical from HTTPS to HTTP

This warning is triggered when one or more secure (HTTPS) pages specify a non-secure (HTTP) version as canonical.

Why is it a problem?

HTTPS is a ranking factor, so it makes sense to specify secure versions of pages as canonical whenever possible.

How to fix it

Redirect the HTTP page to the HTTPS equivalent. If that is not possible, add a rel = "canonical" link from the HTTP version of the page to the HTTPS link.

11. Canonical from HTTP to HTTPS

This warning is triggered when one or more non-secure (HTTP) pages specify a secure (HTTPS) version as canonical.

Why is it a problem?

HTTPS is preferred over HTTP. Having an HTTP version of a page and then calling the HTTPS version canonical is illogical.

How to fix it

Implement a 301 redirect from HTTP to HTTPS. You should also replace any internal links to the HTTP version of the page with links directly to the HTTPS version.

12. Non-canonical site receives organic traffic

This warning is triggered when one or more non-canonical pages appear in search results and receive organic search traffic (which shouldn't be happening).

Why is it a problem?

Either your canonical tags are set up incorrectly or Google chose to ignore the canonical tags you provided.

How to fix it

Check whether the rel = canonical tags are set up correctly on all reported pages. If that's not the problem, use the URL exploration tool in the Google Search Console to see if the given canonical URL is considered canonical. If there is a discrepancy, investigate why it might be.

Final thoughts

Canonical tags aren't that complicated. They are just difficult to understand at first.

Just remember that canonical tags are not an instruction, but rather a signal to search engines. In other words, they can choose a different canonical than the one you declared.

You can use the URL inspection tool in the Google search console to see both user declared and Google selected canonical URLs.

These are the classifications that Google uses in the Index Coverage Status Report in the Google Search Console in relation to canonical URLs:

  • Alternative page with correct canonical tag. This shows pages where you specified an alternate page with a canonical tag and it was respected. Basically everything works as intended to consolidate to a site of your choosing.
  • Duplicate without a custom canonical. There are duplicate pages and none of them have a selected canonical page. In this case, Google picked one for you. So if it's not the one you prefer, you should include a rel = canonical tag.
  • Duplicate, Google chose a different canonical URL than the user. This shows cases where Google ignored your suggestion but chose a different version to display in the index.
  • Duplicate, submitted URL not selected as canonical. This is also a case where a canonicalization signal (submitted in a sitemap) is ignored. There is no explicitly canonical URL in this set of duplicate pages and in this case Google believes that a different URL than the one you submitted should be displayed in the index.

To ask? Let me know in the comments or on Twitter.

Translated bysehrausch.de: Search engine & conversion optimization, online marketing & paid advertising. A perfect fit from a single source.