Danny Sullivan at Search Engine Land revealed a few days ago the results of an internal Google “sting” to see if Microsoft’s Bing search engine uses Google search result data as a factor in Bing’s search results.
Short answer: they do, albeit indirectly. Microsoft admitted as much, though they obviously couched it more delicately.
In the case of this “sting,” Google was able to show that if all other factors were isolated, Bing would sometimes copy bogus Google results. Matt Cutts of Google likened this to how map manufacturers sometimes insert fabricated streets onto maps and then monitor the competition to see if they copy these non-existent streets onto their maps.
What’s happening here?
Microsoft is collecting data through the Internet Explorer browser and/or the Bing Toolbar add-on. That data includes the URLs you visit. In order for Microsoft to know which URLs you click on Google, they either need to directly capture clicks, or they need to capture HTTP referrer information along with every URL. Either way, the data they are receiving lets them know what you searched for in Google, and which Google search result you clicked on.
Bing is then using that data as a factor in their algorithm. With all other factors removed, as Google was able to do in its sting, Bing will in some cases use that as the only factor influencing the search results. That’s not direct copy of Google search results data, but it is certainly indirect copying. Microsoft is essentially using IE users as a proxy for this information.
Microsoft doesn’t have to scrape Google’s search results pages (SERPs) and directly copy them. By receiving URL and click/referrer information, they can recreate the results of scraping Google’s SERPs. All they have to do is extract the Google search query (it’s right in the URL), and then group together clicks that come from that Google search result. The most popular one is very statistically likely to be the first result. The second most popular one is very statistically likely to be the second result, and so forth. The popularity of clicks almost always trends inexorably downward as you go down the list of results on a SERPs. This data is used in Bing’s algorithm. We don’t know to what extent it is used, but it was used enough for Google to become suspicious, and in some cases it is weighted so strongly that all other data is discounted.
Why this is lame
My comment on Twitter was the following:
Illegal? Probably not. Unethical? Maybe. Lame? VERY.
Bing uses a ton of factors to craft their search results pages. If all of those factors are saying that there are no results for the current query, there should be no results for the current query. Sometimes “no results” is the most correct result of all.
Instead, Bing seems to be second-guessing itself. Sure, it found no results, but Google did, so they just use that data without even checking it.
So now we get to my point: the most disturbing aspect here isn’t the indirect capturing of data about Google’s search results pages — it is Bing’s lack of confidence in its own data and algorithms. It is tremendously disheartening that the second largest search engine on the web would discount its own data, use its users to obtain data from the competition, and use that competitor data blindly.
I would absolutely expect Bing to be comparing their results to Google, just like I’d absolutely expect Google to be comparing their results to Bing. Microsoft could even use automated capture of user data to do that. Say if the number one result on Google is below the fifth result on Bing, they could flag it for review. Engineers could study the discrepancy, determine which result set is preferable, and tweak their algorithm to obtain a better result next time. But to directly incorporate the competition’s data into their algorithm crosses the line from “comparing results” to “copying answers.” It may not be technically illegal, but it’s certainly worthy of criticism and a certain amount of shaming.
The web needs search engine competition. Google should not be allowed to rest on its laurels. Competing with Google will require genuine innovation, and Bing is undoubtedly doing much innovation. They’ve made great strides. But incorporating Google search result data into their algorithm will retard their progress. The extent to which Bing’s algorithm is influenced by the competition will necessarily relate to difficulty in evolving their own algorithm. Just as there is more to being a student than getting the right answers on a test, there is more to being a search engine than presenting good results. Your process for getting those results speaks to your value to web users and acts as a prediction of your ability to continue providing good results. Bing shouldn’t be trying to be as good as Google. They should aim higher.