Last week we had Tom Pool, Faisal Anderson and Julien Deneuville joining us on Tea Time SEO talking about Log file Analysis – What you Should Be Looking For. If you missed the talk you can watch the live-stream, look at the slideshow or read Tom’s tips.
There are a whole bunch of insights that can be gleaned from performing a server log file analysis. It can be confusing when you first see those wonderful rows of data that you’ve fought for, so check out these tips!
Tip 1 – Crawl Budget Wastes
There are a whole lot of potential crawl budget wastes that can be identified within a log file analysis. Duplication of content can be a big problem, perhaps where Google (or Bing) is crawling many ‘versions’ of pages, due to odd parameters, forms, calendars or other weird things. These may help identify what Google ‘sees’ as a link and can help reduce the amount of unnecessary pages that Google crawls.
You’ll want to look into the most and least crawled URLs, and work out why. Is there maybe some page that you thought was really helpful, but Google isn’t crawling it – or vice versa?
Consider linking from most crawled pages more, to those that don’t see as much or any activity. Do make sure that the linking makes sense from a user perspective, else Google might not see any value with the link.
Tip 2 – Real Googlebots / Pretend Googlebots
With the amount of potential data that you could be working with, you’ll want to ensure that the data that you are going to be working with is real! Utilise a reverse IP address lookup to verify that any Googlebot traffic that you have data for is the real deal, and not a fake. I’ve personally crawled a number of sites using a Googlebot UserAgent, and your logs might also contain this data too.
Tip 3 – Combining Crawl Data with Logs can really supercharge insights
Combining log files with crawl data can also provide a lot of invaluable information. Are there URLs that are found in one dataset that aren’t found in the other? Is Google crawling URLs that are not linked within the crawl data that you have? Have you URLs found in the crawl that Google doesn’t know or care about?
These are all areas that you’ll want to further explore, and make recommendations to ensure all pages that you care about are being crawled, and ones that you don’t, aren’t.
It’s also worth looking to see if internal linking matches up with data shown within the log files. I’ve personally seen cases where the IA of a site has almost exactly reflected the most popular URLs seen by Googlebot. This can be a powerful motivator to stakeholders if you’re struggling to get a new page or section added to the overall IA of a site.
Tip 4 – Referrer data can be a goldmine – if logs are set up to capture this!
Referrer data is absolutely awesome, however, a lot of logging solutions don’t have this set up by default. If possible, set logs up to capture this data! Then you can see where requests have come from, you can identify popular entry & exit pages, and also see which site or page sends the most referral traffic. You can also capture user data, and identify the user funnel better. Match this up with analytics data to get the most amount of insight.
Tip 5 – Bonus – Learn Pandas with Python for ease of data manipulation
Following on from something that Faisal mentioned – use all the (relevant) tools that are at your disposal to be able to get the best insights. A personal favourite of mine for large scale data manipulation and analysis is Python, using the Pandas library. This enables you to manipulate massive amounts of data super easily, and can really help speed up the log file analysis process.
There’s an absolutely awesome guide that can help get you started on this (if you’re interested) that can be seen here.