Whose Search Data is it Anyway? – Firefox to encrypt referring search strings
One of the key reporting metrics returned by website analytics programs such as Google Analytics and Comscore is referral data.
What is Referral Data?
Referral data enables website owners and marketers to determine how traffic found their site. Combined with goal or ecommerce tracking this data can be used to report upon the success of particular marketing campaigns.
Most websites will have 3 main visitor sources – search, referral and direct traffic. Search traffic encompasses all clicks from the results pages of a search engine, be the click on an organic or paid search listing.
Referral traffic is a click from another website, for example a blog or social media platform, whilst direct traffic accounts for visitors who have typed your URL directly into their address bar.
The exact proportions of traffic from each source will differ on a site-by-site basis but a healthy strategy should see visitors, and conversions, coming from across these different segments.
Analytics packages can drill down into this data to provide more details about how people found particular pages on your site. For search traffic the most important piece of data is the search query or “keyword” which was entered into a search engine before your listing was clicked.
This data is presented in an aggregated format and on its own is not personally identifiable in most cases. Webmasters can use this data to help establish the kind of content people are looking for and the questions visitors have relating to their site or products which can be used to help improve site content.
Who else has referral data?
It isn’t just the websites you click through to that get to discover how you, and the rest of the Internet, came to access them. Numerous other organisations can see the websites you visit and how you found them.
The large search engines monitor every search you make and result you click in order to use this data to help improve their rankings and customise the results and advertisements you see.
Who could forget the embarrassing release of AOL search data!
The web browser you are using to read this very blog post may very well be passing information on to its creator about how you found this article.
For many this was brought to attention when Google accused Microsoft’s Bing of copying its search results following a honeypot operation. Users had granted permission for Microsoft to use their clickstream data by ticking a box which they were told would “help Microsoft improve your online experience”.
Indeed it isn’t just search activity that browsers are able to access. 2% of all text entered into Chrome’s Omnibox is logged by Google if you have set the engine as the default search results provide, regardless of whether or not you actually press enter to trigger the search results page.
Browser plugins and addons know which pages you visit – how else would plugins such as Google Toolbar be able to show you page specific information as you browse the web! Services such as Alexa make use of such toolbars to generate their traffic rank statistics.
It’s easy to extract search queries from URLs visited.
Internet Service Providers (ISP’s)
ISP’s take a record of every webpage you visit and have, in the past, sold this data to advertising providers such as Phorm. Web statistics companies such as Hitwise use such data to build up their click-stream and competitive research platform.
The Government now wants to be able to get its hands on more of this data from ISPs too!
Google Switch to HTTPs
In October last year Google implemented a change to “make search more secure”. From this date users who performed a search whilst logged into a Google account would have their search query encrypted over the HTTPS protocol.
Whilst the user experience has remained pretty much the same when searching, except that queries are directed through https://www.google.com instead of http://www.google.com, the owners of visited search results are no longer able to obtain information on the keyword which was searched.
Very quickly a new keyword entered the Analytics reports of webmasters – ‘not provided’. As the number of registered Google accounts has increased, helped no doubt by the Google+ social network, the proportion of searches returning ‘not provided’ as the referring keyword have increased.
At the start of March 2012 Google defaulted to encrypted search for logged in users searching on localised versions of their site as well, which for UK-based sites had a far more significant effect on reporting than the initial .com introduction.
Though Google is still returning this keyword data if you pay them for the privilege of the click through Adwords!
Proportion of Searches Effected
Prior to roll-out Search Engine Land’s Danny Sullivan reported Google’s Matt Cutts as saying that only “single-digit percentages” of searches would be affected by (not provided).
We have seen the proportions of organic Google traffic affected by the change vary significantly across sites and industries with many in the SEO community believing that single-digit percentages have been exceeded.
Data from surveys by SEOmoz, Hubspot and Search Engine Land in November of last year after the initial Google.com roll out of HTTPs all reported that around 12% of organic search referrals were then showing alongside (not provided).
Three months after the expanded March roll-out we’re generally seeing (not provided) account for around 7-8% of Google organic referrals across our client sites, increasing to 11-12% for sites which have a more “techy” theme.
SEO and online marketing industry sites have been far more noticeably affected given that their readership are more likely than the average web user to be using their own personal Google accounts when searching. Towards the end of April SEOmoz’s Casey Henry tweeted that the SEO software company was seeing over 50% (not provided) traffic! For sites which receive traffic from a wide spread of keywords the total of all these (not provided) visits can easily push the term to the top referring keyword for a website.
Firefox to Block Search Referral Data – Other Browsers to Follow?
Whatever the percentage of ‘not provided’ traffic you are seeing in your analytics package – the number is set to increase significantly soon. The leaking of search keyword data from SERPs to visited sites was reported to Firefox as a “bug” back in February 2011 by privacy researcher Christopher Soghoian.
The browser creator, Mozilla, has implemented a fix which will make up one of the main additions to Firefox 14. This version of Firefox, dubbed Aurora, will by default perform Google searches over the HTTPs protocol – so search query strings will show as ‘not provided’.
Aurora moved in to beta on June 5th and according to the Firefox release calendar will be pushed out on full release on 17th July. With Firefox browser share currently at about 25% worldwide (closer to 20% in the UK) the impact of this switch will likely be greater than the initial switch for logged in users.
The other major browsers may soon be following Firefox’s lead in preventing keyword data reaching your analytics account. The BBC reported that Microsoft’s Internet Explorer 10 “will be the first version of the browser with ‘do not track’” turned on by default”. The remaining popular browser, Google’s Chrome, is of course owned by the very people who last year started to flicked the switch for their logged in users to HTTPs.
As is the case with the recent EU Cookie directive the debate of usability versus end-use privacy has once again started!