rel=canonical: the ultimate guide

A canonical URL lets you tell search engines that certain similar URLs are actually the same. Because sometimes you have products or content that can be found on multiple URLs — or even multiple websites. By using canonical URLs (HTML link tags with the attribute rel=canonical) you can have these on your site without harming your rankings. In this ultimate guide, we will discuss what canonical URLs are, when to use them, and how to prevent or fix a few common mistakes!

The rel=canonical element, often called the “canonical link”, is an HTML element that helps webmasters prevent duplicate content issues. It does so by specifying the “canonical URL”, the “preferred” version of a web page – the original source, even. And this improves your site’s SEO.

The idea is simple. If you have several versions of the same content, you pick one “canonical” version and point the search engines at it. This solves the duplicate content problem where search engines don’t know which version to show in their results.

The SEO benefit of rel=canonical

Choosing a proper canonical URL for every set of similar URLs improves the SEO of your site. This is because the search engine knows which version is canonical, and can count all the links pointing at the different versions as links to the canonical version. In concept, setting a canonical is similar to a 301 redirect, only without the actual redirecting.

The history of rel=canonical

The canonical link element was introduced by Google, Bing, and Yahoo! in February 2009. If you’re interested in its history, we would recommend Matt Cutts’ post from 2009. This post gives you some background and links to different interesting articles. Or watch the video of Matt introducing the canonical link element. Because, although the idea is simple, the specifics of how to use it are often a bit more complex.

The process of canonicalization

When you have several choices for a product’s URL, canonicalization is the process of picking one of them. Luckily, it will be obvious in many cases: one URL will be a better choice than others. But in some cases, it might not be as obvious. This is nothing to worry about. Even then it’s still pretty simple: just pick one! Not canonicalizing your URLs is always worse than canonicalizing your URLs.

How to set canonical URLs

Let’s assume you have two versions of the same page, each with exactly – 100% – the same content. The only difference is that they’re in separate sections of your site. And because of that the background color and the active menu item are different – but that’s it. Both versions have been linked to from other sites, so the content itself is clearly valuable. So which version should search engines show in results?

For example, these could be their URLs:

https://example.com/wordpress/seo-plugin/
https://example.com/wordpress/plugins/seo/

A correct example of using rel=canonical

The situation described above occurs fairly often, especially in a lot of e-commerce systems. A product can have several different URLs depending on how you got there. But this is exactly what rel=canonical was invented for. In this case, you would apply rel=canonical as follows:

1. Pick one of your two pages as the canonical version. This should be the version you think is the most important. If you don’t care, pick the one with the most links or visitors. When all these factors are equal, flip a coin. You just need to choose.

2. Add a rel=canonical link from the non-canonical page to the canonical one. So if we picked the shortest URL as our canonical URL, the other URL would link to the shortest URL in the <head> section of the page – like this:

<link rel="canonical" href="https://example.com/wordpress/seo-plugin/" />

It’s as easy as that! Nothing more, nothing less.

What this does is “merge” the two pages into one from a search engine’s perspective. It’s a “soft redirect”, without actually redirecting the user. Links to both URLs now count as the single, canonical version of the URL.

When should you use canonical URLs?

301 redirect or canonical

If you are unsure whether to do a 301 redirect or set a canonical, what should you do? The answer is simple: you should always do a redirect, unless there are technical reasons not to. If you can’t redirect because that would harm the user experience or be otherwise problematic, then set a canonical URL.

Should a page have a self-referencing canonical URL?

In the image above, we link the non-canonical page to the canonical version. But should a page set a rel=canonical for itself? This question is a much-debated topic amongst SEOs. We strongly recommend having a canonical link element on every page and Google has confirmed that’s best. That’s because most CMS’s will allow URL parameters without changing the content. So all of these URLs would show the same content:

https://example.com/wordpress/seo-plugin/
https://example.com/wordpress/seo-plugin/?isnt=it-awesome
https://example.com/wordpress/seo-plugin/?cmpgn=twitter
https://example.com/wordpress/seo-plugin/?cmpgn=facebook

The issue is that if you don’t have a self-referencing canonical on the page that points to the cleanest version of the URL, you risk being hit by this. And if you don’t do it yourself, someone else could do it to you and cause a duplicate content issue. So adding a self-referencing canonical to URLs across your site is a good “defensive” SEO move.

Cross-domain canonical URLs

Perhaps you have the same piece of content on several domains. There are sites or blogs that republish articles from other websites on their own, as they feel the content is relevant for their users.

But if you had looked at the HTML of every one of those articles you’d found a rel=canonical link pointing right back to our original article. This means all the links pointing to their version of the article count towards the ranking of our canonical version. They get to use our content to please their audience, and we get a clear benefit from it too. This way everybody wins!

Faulty canonical URLs: common issues

There are many examples out there of how a wrong rel=canonical implementation can lead to huge issues. I’ve seen several sites where the canonical on their homepage pointed at an article, only to see their home page disappear from search results. But that’s not all. There are other things you should never do with rel=canonical. Here are the most important ones:

1. Don’t canonicalize a paginated archive to page 1. The rel=canonical on page 2 should point to page 2. If you point it to page 1, search engines will actually not index the links on those deeper archive pages.

2. Make them 100% specific. For various reasons, many sites use protocol-relative links, meaning they leave the http / https bit from their URLs. Don’t do this for your canonicals. You have a preference, so show it.

3. Base your canonical on the request URL. If you use variables like the domain or request URL used to access the current page while generating your canonical, you’re doing it wrong. Your content should be aware of its own URLs. Otherwise, you could still have the same piece of content on – for instance – example.com and www.example.com and have each of them canonicalize to themselves.

4. Multiple rel=canonical links on a page cause havoc. When we encounter this in WordPress plugins, we try to reach out to the developer doing it and teach them not to, but it still happens. And when it does, the results are wholly unpredictable.

rel=canonical and social networks

Facebook and Twitter honor rel=canonical too, and this might lead to weird situations. If you share a URL on Facebook that has a canonical pointing elsewhere, Facebook will share the details from the canonical URL. In fact, if you add a ‘like’ button on a page that has a canonical pointing elsewhere, it will show the like count for the canonical URL, not for the current URL. Twitter works in the same way. So be aware of this when sharing URLs or when using these buttons.

Advanced uses of rel=canonical

Google also supports a canonical link HTTP header. The header looks like this:

Link: <https://www.example.com/white-paper.pdf>;    rel="canonical"

Canonical link HTTP headers can be very useful when canonicalizing files like PDFs, so it’s good to know that the option exists.

Using rel=canonical on not so similar pages

While we wouldn’t recommend this, you can use rel=canonical very aggressively. Google honors it to an almost ridiculous extent, where you can canonicalize a very different piece of content to another piece of content. However, if Google does catch you doing this, it will stop trusting your site’s canonicals and thus cause you more harm…

Using rel=canonical in combination with hreflang

We also talk about canonical in our ultimate guide to hreflang. That’s because it’s very important that when you use hreflang, each language’s canonical points to itself. Make sure that you understand how to use canonical well when you’re implementing hreflang, as otherwise, you might kill your entire hreflang implementation.

Conclusion: rel=canonical is a power tool

Rel=canonical is a powerful tool in an SEO’s toolbox. Especially for larger sites, the process of canonicalization can be very important and lead to major SEO improvements. But like with any power tool, you should use it wisely as it’s easy to cut yourself. We hope this guide has helped you gain an understanding of this powerful tool and how (and when) you can use it.

Leave a Reply

Your email address will not be published. Required fields are marked *