Search engines have introduced the concept of canonical links in an attempt to solve the problem of duplicate data in their search results. The basic problem is that multiple links to the same content can vary in style, thereby creating duplicate instances in the results pages of a search. The variation in style could be really simple like these:
All of these links produce the same content, and would be considered different by a search engine and would, therefore, produce duplicate results. Consider a more complex example:
Again, all of these links essentially lead to the same content. The variances defined by the query string parameter only alter the way content is displayed, and not the intrinsic nature of the content itself. We, therefore, want to avoid having all four examples showing up in Google, if possible.
There are two ways to achieve this:
The first is to always use absolute urls in links - a consistent approach to linking reduces duplicates on the site and reduces the variance of links found by the search engine's crawler.
The second is to use a canonical link - this would be added to the head element of a page and contain a URL that is considered to be the 'true' path to the content. In the e-commerce style example above, the link element would look like this:
<link rel="canonical" href="http://www.fishing.ca/shop/rods.aspx" />
This means that however, a crawler reached the page, it would know that the content is the same as defined in the page found at the http://www.fishing.ca/shop/rods.aspx and would therefore not index it separately.
There are a number of caveats:
- A canonical link is used as a strong hint as to the 'actual' content of the page - it is not guaranteed to override the crawl results.
- It is strongly recommended to use absolute URLs
- Links can not cross domains, although they can cross sub-domains
- If a link returns a 404, or a page that is considered to have different or non-similar content, the link will be ignored.
In summary, using canonical links will help us build better search-engine optimized sites; but we shouldn't stop doing the basics like using proper absolute urls and building good site maps etc. There's still more research to do on all of this!