By Scott M. Fulton, III, Betanews
Yesterday, without much explanation or instructions, Google opened the floodgates on what it's describing as the next generation of its search engine, most likely to test its efficiency and performance using real-world traffic. Testers are being invited to sample the new engine that Google is calling "Caffeine," although perhaps intentionally, it isn't yet explaining just what the differences are.
In Betanews' initial tests Tuesday morning comparing Caffeine to Google's current stable release, we noticed that for nearly every simple and complex search query we tried, the top three non-paid search results were always the same. But the order of results starting as high as #4, sometimes #6, changed. Usually Caffeine retrieved the same pages as the stable version, but shuffled them in a different order.
For instance, with a query that used to stump search engines that couldn't make sense of special punctuation, "virtual function" C++ C#, we would expect to find entries that compared the use of virtual functions in two classes of programming: traditional C++ and Microsoft's C#. For this query, the first four results retrieved were the same for Caffeine as for the stable version (which I'll call "Stable" for short). But Caffeine swapped the order of entries #5 and #6: an independent article by Jordan Leverington from devarticles.com, and an MSDN article by Microsoft support engineer Rakki Muthukumar, respectively. Caffeine placed the independent article higher.
Caffeine also included a separate grouping of entries taken from Google Groups (posts on Usenet forums), which Stable omitted. Caffeine then rated the text of a discussion forum on the subject much higher -- #7 rather than #10.
Unless you believe in conspiracy theories (and I tend not to), the logic of this organizational shuffling isn't yet self-evident. So for our next trial, we tried an intentionally vague query related to something that's been in the news lately: folks who show up at political town hall meetings and try to out-shout the speaker. Without any punctuation and without much specificity, our test query is shouting town hall.
For both engines, the retrieved Google News entries at the top were identical -- evidently Caffeine's new algorithms do not extend to Google News (or to any other Google department). And again, the top three entries retrieved by Caffeine and Stable were identical, with a Fox News story showing up as #1, and a CBS News story as #3.
The #4 items were different, although they came from the same source: the political blog TalkingPointsMemo.com. Both were about the strange trend of congressmen being shouted down by onlookers at public events. But surprisingly, of the two stories the engines pulled up, it was Caffeine that pulled up the older story (August 3 versus August 5 from Stable); and it was the Stable version that included an expansion box enabling more results from the same blog.
While the remainder of Caffeine's Page 1 entries included a YouTube video that didn't appear on Stable's Page 1, Stable pulled up this story from a St. Louis Fox affiliate of a shouted-down town hall meeting from last week, conducted by Rep. Russ Carnahan (D - Mo.) headlined "Carnahan Town Hall Turns Into Political Shouting Match" -- certainly fitting the criteria -- rating it #8; while Caffeine rated the same story #32.
This one is something of a puzzle, because all three of our search query words appear squarely in the story's headline (although "shouting" was not in the URL). And since it fit the subject matter, Caffeine should have had good reason to rate it high. In an effort to discover why, we mixed the order of the terms to town hall shouting, so that Google's interpreter would pair the first two terms rather than the second and third. As expected, we received different results. The second time around, with Caffeine, a different TalkingPointsMemo.com story (but still August 3) appeared as #4, and the one that had been #4 just a few minutes ago had been bumped down to #7. The order of stories appearing from #5 on had changed. Meanwhile, the same reorganized query in the Stable version bumped the Carnahan story down one spot to #9, whereas Caffeine bumped it down to #39.
In other words, our promoting the pairing of "town hall" in the query demoted a story that should ring alarm bells for that very context, more so for the test version of the search engine than for the stable version.
Next, the tie-breaker: How both engines handle a misspelled query…
After our initial tests of Google's experimental Caffeine search engine versus its existing stable one, we're still in something of a fog as to what the differences mean. So for our third heat, we decided to implement a purposefully botched query:
Joshua Schachter, the founder of social bookmarking site Delicious who sold that site to Yahoo a few years ago, is in the news today for a remark he made on a public forum about regretting that move. His name is a difficult one for some Americans to remember, let alone spell, so we're going to implement a query that confuses the poor fellow with IndyCar driver Tomas Scheckter. And just to show how stupid we can act when we're getting paid to, we'll misspell poor Scheckter's name while we're at it.
So our botched query is Thomas Schacter Delicious Yahoo, as might be written by someone wanting some information on whatever it was the guy who founded Delicious said about Yahoo. We expect a lot of misses with this one, but how soon will either engine pull up an entry that corrects our spelling? Both Caffeine and the current Stable version of the Google engine offered to correct the spelling of the correct Mr. Schachter's last name, judging from its position next to Delicious. And both engines pulled up as their #1 entry a blog post by Thomas Hawk from a 2006 meeting of social network leaders that Schachter attended.
But the citation that Caffeine pulled up from that same #1 entry contained the full and correct spelling of "Joshua Schachter," which anyone botching his name should have right away. That's a plus for Caffeine. But the Stable release included as its #4 entry a directory listing for a psychologist whose name very nearly matched our botched spelling -- more so than for Mr. Schachter or Mr. Scheckter. In Caffeine, that entry was completely missing from its first 100 search results.
So from what we can surmise thus far, we believe Google is experimenting with improving search relevancy by demoting or throwing out entries that maybe don't apply to the context of the query. In the good Dr. Schachter's case, yes, his entry should perhaps be thrown out because it really had no bearing to Delicious or Yahoo, even though his name was close to our misspelling. But Caffeine's terrific demotion of Rep. Carnahan's town hall meeting when it clearly did fit the context of our query, seems unreasonable by comparison.
As Google software engineer Sitaram Iyer blogged yesterday, "Right now, we only want feedback on the differences between Google's current search results and our new system." That feedback, Iyer said, may be delivered by clicking on the Dissatisfied? Help us improve link on the bottom of the company's sandbox page, although in some circumstances, arguably, testers may not be dissatisfied...though perhaps Dr. Schachter has cause for complaint.
Copyright Betanews, Inc. 2009