We had a great Tea Time SEO session about Technical SEO last week with Serena Pearson, Paul Lovell and Franco Valentin. Read the insights on slideshare and the points about duplication from Serena Pearson that she further elaborated on below:
1. Create an automated system to identify pre-determined technical issues.
There are some duplicate URLs which can be identified using a calculated system. Examples include URL fragmentation (URLs with a # at the end which may not do anything), URL parameters (sometimes unavoidable, especially in ecommerce), uppercase letters that aren’t redirected to its lower case equivalent, trailing slashes and non-trailing slashes versions both existing for the same URL, HTTP & HTTPS duplicates, www. and non-www. duplicates. By creating an automated system to identify these duplication issues, it can become really efficient to optimise for crawl budget.
2. While some duplicate URLs can be automatically identified, there are also other duplicate issues to look out for.
Due to the way some websites are built, when URLs are uploaded, there may be an equivalent uploaded with a different URL format. This includes an equivalent of the URL Tail (or Slug) being uploaded at the root domain, in other subdomains, potentially the help subdomain. Another example is when /node URLs are created as well, and potentially linked to from the XML sitemap, meaning that it is still discoverable by search engines. Other common issues includes duplicate content being uploaded, with slightly different spellings, or the blog creating duplicate tags and folders hosting the same content. One way to quickly identify these issues is to check for duplicate meta titles, H1s, and meta descriptions.
3. Fully crawl the website, or sections of a large website, enabling JS rendering to gather all potential duplicate content issues.
If you’re crawling a particularly large website, consider enabling a crawl limit. You can use this to check for duplicate content issues which may not be picked up otherwise. Then, when you’re making recommendations and implementing them, you are making full use of your developer or resource, as you’ve got a comprehensive view on all the issues. Once this is complete, it should be easier to re-crawl the website and repeat the process to see if there are any more duplicate issues remaining.
4. If you can’t find the source of a duplicate link, check the rendered code, and mobile/tablet view.
5. Properly remove the duplicate URLs.
Nofollow may have worked in the past, but now it’s taken as a ‘hint’, so it’s not the full-proof method for optimising crawl budget. It’s important to consider that URL fragmentation and parameters may have a priority purpose to exist on a website, such as for UX. Some pages may be particularly long and having fragments is good for UX and shareability. If the duplicate URL issues exist on only a few pages and are small issues, it may not be worth fixing it for the time invested. For larger scale issues, some can be resolved with updating the .htaccess file, if you’re using Apache, or equivalent based on your system.
Thank you Serena for sharing your tips about resolving duplication issues on your website. If you missed the session, join us on Tea Time SEO by signing up here for regular updates.