The Steady Growth of (Not Provided) Traffic

Some of you may recall that shortly after Google officially launched SSL search query support in October 2011, we dug into our backend databases (as part of our Data Research Programme) to try and ascertain the effect it was having on the sites we were monitoring.

The results were provided (no pun intended) on our blog and the initial dataset suggested an overall (initial) %age of around 3% of all organic queries. However, there were two caveats to this research: i) this was very early days ii) we only analysed a small number of sites. In fact, at that time, we only noticed a very small percentage of sites picking up (not provided) traffic at all. However, we expected it to grow and so – almost a year on to the day – we decided to go through a similar process to see what our data could tell us. Before I go on, I want to add a few disclaimers. Firstly, this is not me:

(If you don’t understand the reference, no, I haven’t got hold of a private picture of Matt Cutts in his PJs; this chap was called "Statto" and appeared on a very popular British football (soccer) TV programme in the 90s.)

So, I am not a data statistician geek and don’t claim to be, so I’m going to keep this analysis pretty simple, but still, I hope you find it to be helpful and meaningful.

Secondly, the analysis is not based on every site we have ever monitored or are currently monitoring (as it’s based on analysing anonymised Google Analytics data and not all sites on our platform are setup with Google Analytics data).

Thirdly, there was a specific process I wanted to go through.

I ended up with data for several hundred sites over this specific period (the original list had day by day organic and NP data for several thousand sites and I could have possibly included more sites, but I was being fussy about the data I used).

The overall Not Provided average was ~20.5% in September 2012!

The overall number of (not provided) queries judged simply as a percentage of overall organic traffic had grown significantly and was now averaging out to 20.5% by September (across this sample set). This was more than double what Matt Cutts had originally suggested it would affect (he used the phrase "single digits" at the time – hence the reference in the graphic at the top of this post).

Steady Month on Month Growth:

The MOM growth was +1.5%, notwithstanding the initial jump in November from virtually nothing (0.3%) to 4.1%, the large jump in March 2012 (+5.7% which was effectively a 105.5% increase over February’s number) and the odd reduction in June of -0.4%.

The Outlook for 2012-2013:

If Not Provided traffic carries on growing in this way, then by September 2013 SEOs might be trying to deal with an overall Not Provided percentage of ~40% of their overall organic traffic.

As you can see from the distribution analysis below, there was a spike in the early months as most sites did indeed show low single figure averages for Not Provided traffic. However, in the early part of this year, most sites moved into double digit ranges and that growth has continued (moreorless) unabated.

 

The widening of the swell from March through to September clearly illustrates how the effect of Not Provided has spread and that most sites are now showing double digit numbers which are consistently increasing month on month. By the second anniversary in September 2013, I’d expect the same illustration to show the wave much nearer the shoreline (and by shoreline, I obviously mean the 100% mark).

Potential Causes of the Growth:

The reason for this growth? Well, increased browser support for SSL, growth in Google+ signups and increasing users of GMail (+~12.5% per month in 2012) are all fairly obvious potential causes.

But there were (perhaps) a few other things that stood out.

What caused the NP traffic to double in March?

Firstly, let’s take that jump in March which effectively meant that the NP data had doubled over the previous month’s average. What caused this? I had thought initially that this might be explained by Firefox switching on SSL by default, but that appears to have happened in July. I’m at a loss to explain it at the moment, so if any of you reading this have any bright ideas as to what could have caused this, by all means, add some comments below.

Statistical Blips?

Secondly, can we ignore the drop in June and dismiss it as a statistical blip? Probably, as it was the only month which saw a month on month decline in the NP average. Still, I’ll keep an eye for any further drops, but I don’t expect the trend to go down. After all, what could cause the NP trend to suddenly reverse? Would users really start to switch SSL off in their browsers? Would most searchers even know how to do this? For example, I’d imagine that since Firefox started having SSL search enabled as a default option, most users would simply leave it like that (Firefox has around a 20-30% share of the market).

Examining anomalies

Thirdly, were there any stand out websites which bucked the trend one way or another? Yes, there were certainly were. Whilst the average NP percentage over the 12 month period actually worked out to be just 10.4%, it was perhaps more meaningful to take the latest monthly average (c.20%) as a benchmark and see if I could spot any websites where they were far off this mark. Well, the range in the above dataset (just looking at September 2012) went from 0.0% all the way up to 65.0%.

I decided to pick through the data in a little more detail. I looked at the sites with little or no NP traffic. Were there any common obvious traits I could spot? Were they all brochureware sites, B2B or B2C sites, for example? Were they were from a common vertical?

One in particular was a very odd niche industry site and it showed consistently low NP traffic all year (hovering ~0.1%). Could this simply be due to the nature of its users?

Similarly, were all the sites with a lot of NP traffic, B2C sites, retailers, e-commerce sites, etc? Certainly, one site stood out even though it was one I’d excluded from the overall analysis above (as at the time we ran the report we only had three months’ worth of GA data for it). I’m not going to state exactly which site it is, as we keep all site details confidential, but I will say that it is a very well-known B2C brand which, by September, had an NP rate hovering around the 60% mark.

Another site (also excluded from the overall analysis because I only had ten months’ worth of data for it) had caught my eye because I’d noticed that its Not Provided traffic had peaked at 98.4% (~1.1k) on one particular day and then dropped right off a cliff face the following day. It appeared to be a reasonably sized website (~800-1000 organic visitors per day, so ~300k organic visitors p.a.) so I decided to plot its NP traffic over the year and this was the result:

 

 

Weird, huh? Nothing for the first week (which was pretty consistent with October in general – a very slow start in picking up Not Provided traffic for most sites) and then it just went haywire. What was going on with this site’s NP traffic up until March this year? It was all over the place! Was Google presenting the data correctly for the first few months in Google Analytics or were there some technical glitches? Maybe it was just an isolated case. It was odd, nevertheless. Could you actually gauge anything at all from looking at raw data like this? Could we come up with some possible suggestions for patterns in Not Provided data across multiple websites?

Looking for Patterns – is there a a common demographic cause for Not Provided data?

I went back to the large B2C retail site I’ve already mentioned and another high NP traffic site. When I actually went beyond the raw data and took a look at each site, one thing did strike me both sites would probably have the same target demographic (i.e. 25-35 year olds women). So I decided to run a few quick checks (on www.quantcast.com). These sites had a markedly different user demographic to the ‘low NP niche site’ I mentioned earlier which had a main user group of 45-65 year old males predominantly.

Okay, these were only isolated cases and perhaps you could read too much into it, but at first glance, the difference in the userbase demographic between these sites was pretty obvious.

 

Userbase demographic: Site with High NP average:

 

Userbase demographic: Site for Low NP Average:

Credit: www.quantcast.com

 

And yes, I’m looking at different markets here and there were a whole host of other factors I could have potentially accounted for. However, it simply made me raise the question of whether there were certain demographics more likely to be logged into Google+/GMail and therefore more likely to be contributing to your site’s Not Provided traffic? I imagined it was the case, so I simply took a quick look at the demographic data for GMail itself (Google+ demographic data being unavailable on Quantcast):

 

 

Could there a pattern? Could this age group be the most likely to be logged into their GMail/Google+ accounts and therefore the ones most likely responsible for being the source of the Not Provided data your site is getting?

If a pattern were to emerge, it could at least help forewarn an SEO that if they are taking on a particular type of client, they may or may not have an issue with Not Provided data, depending upon the type of site and that’s site demographic catchment area. And could an SEO then leverage that knowledge and adapt their social media strategy to suit? For instance, were your site to be below the norm in terms of Not Provided averages, you could assume that your userbase might be less likely to have Google+/Gmail accounts and therefore less likely to take any notice of the work you planned to do on your client’s Google+ page. 

It’s probably too soon to draw any reasonable conclusion without further investigation, a bigger data set and more time, so what can we say?

Well, let’s be practical. Not Provided isn’t going to go away (as you can see from that tidal wave), so if you are an SEO facing this issue, what can you do? Well, there’s plenty of good advice out there from SEOs, such as:

  1. Look at the landing pages for your NP traffic

  2. Use the Keyword data you do have

  3. Avoid making the wrong assumptions

I’m not going to list a summary of all the best advice here, but if you’re interested in reading up on possible strategies for dealing with the Not Provided issue, then here are some good blog posts on the subject:

Recovering Not Provided Keyword Data

Overcoming-not-provided-keyword

Keyword analysis in a world of no provided

What might be next for not provided

Our US President, Dennis Hart gave a short presentation at SMX East last month on Not Provided initial analysis suggested that an average NP rate of 23% which, as you can see, I’ve revised down now to 20.5%. (Incidentally, SEO Clarity were apparently also giving a talk at SMX East on Not Provided and had come to a figure of 24%, so we’re all in the same ballpark).

We’ll keep monitoring this data and plan to delve into it again shortly in more detail to hopefully provide a few more insights. If you have any suggestions as to what we should be doing with this data, then please drop us a line or add some comments below. Please also let us know what your own analysis shows! We’d love to know.

By: Matt OToole

Leave a Reply

Your email address will not be published. Required fields are marked *