Is Search Broken?

By Tom Foremski - March 3, 2007

Search engines say they use complex algorithms to help users find exactly what they want Google's "I'm feeling lucky" button (btw, does anybody use it?), right below the search box implies that very thing.

The legions of top Ph.Ds working for the search engines publish oodles of scientific papers on complex mathematical concepts related to search.

 Recent Papers Written by Googlers

It all looks very impressive but it seems to have more to do with contributing to the mythology surrounding search--that is very complex and scientific--than to the actual reality of how search is done.

From my vantage point as an online publisher, it is clear that search is increasingly "people-powered" rather than machine-powered. There are millions of people helping the searchbots find information.

Here are some examples and gripes:

- There are many publishers that try to make sure their headlines catch the attention of the search engines rather than catch the attention of readers. The same is true for content, editors increasingly optimize it for the search engines rather than the readers.

- Why should I have to tag my content, and tag it according to the specific formats that Technorati, and other search engines recommend?  Aren't they supposed to do that?

- Google relies on a tremendous amount of user-helped search. Websites are encouraged to create site maps and leave the XML file on their server so that the GOOGbot can find its way around.

- The search engines ask web site owners to mask-off parts of their sites that are not relevant, such as the comment sections,  with no-follow and no-index tags.

- Web sites are encouraged to upload their content into the Googlebase database. Nice--it doesn't even need to send out a robot to index the site.

- Every time I publish something, I send out notification "pings" to dozens of search engines and aggregators. Again, they don't have to send out their robots to check if there is new content.

- Google asks users to create collections of sites within specific topics so that other users can use them to find specific types of information.

- The popularity of blogs is partly based on the fact that they find lots of relevant links around a particular subject. Blogs are clear examples of people-powered search services.

And there are many more examples. If the search engines are so great at doing what they do, then how come we have to do all of the above?

I resent the fact that I have to create all this content describing my content--the search engines should be creating this "metadata."

I just want to write stuff,  and leave it up to the search engines to find it, classify it, index it, and do all the other things their mythology suggests that they do.

In the world of enterprise search, companies such as FAST, Vivisimo, Autonomy, etc, have to find information without the benefit of aids. Corporate documents have no pagerank or tags or much metadata of any kind.

Yet in consumer search it seems as if nothing would be found without a huge amount of help from millions of people every day. Why is it that we have to help the search engines do a job they are supposed to be doing by themselves?

I wonder about the productivity cost to society from all this human labor--work that is supposed to be done by robots.

It's as if these searchbots are blind, and we have to lead them patiently along the street and point things out to them, while they tap away at the world with white canes.

...

Part 2: Search seems to be broken...

,


Share this article

By Tom Foremski - March 3, 2007 | Permalink | Category: Search Watch
| SVW Newsfeed | SVW Toolbar | SVW Newsletter | SVW Mobile

Comments (13)

Because it's not technology sufficiently advanced enough to be indistinguishable from magic.

It's a HARD PROBLEM.

There's an enormous haystack to dig through to find a few needles. Anything helps.


> There are many publishers that try to make sure their headlines catch the attention of the search engines rather than catch the attention of readers. The same is true for content, editors increasingly optimize it for the search engines rather than the readers.

Um, yeah -- and you have to do that for humans as well, describe fully what the content is about, because your witty undescriptive headline among many others may tell them nothing.

> Why should I have to tag my content, and tag it according to the specific formats that Technorati, and other search engines recommend? Aren't they supposed to do that?

You don't, with the major search engines. They do do that -- every word on your page is a tag.

> Google relies on a tremendous amount of user-helped search. Websites are encouraged to create site maps and leave the XML file on their server so that the GOOGbot can find its way around.

They don't rely on this at all. Don't put up a sitemap, they find your content just fine. That's an extra step you can do, and it may evolve into being a more important one, but it's not necessary

> The search engines ask web site owners to mask-off parts of their sites that are not relevant, such as the comment sections, with no-follow and no-index tags.

No they don't. The provide these so that site owners that don't want content in the search engines can keep that stuff out, since they automatically include everything by default. It's something site owners want.

> Web sites are encouraged to upload their content into the Googlebase database. Nice--it doesn't even need to send out a robot to index the site.

By default, they crawl your content. The very specific Google Base -- an area used for some very specific content, and a few other veriticals, as for uploading.

> Every time I publish something, I send out notification "pings" to dozens of search engines and aggregators. Again, they don't have to send out their robots to check if there is new content.

Google, Yahoo, Microsoft and Ask all do, automatically, notification for not.

> Google asks users to create collections of sites within specific topics so that other users can use them to find specific types of information.

It provides a feature so people who want to do this can. Many search engines to the same.

> And there are many more examples. If the search engines are so great at doing what they do, then how come we have to do all of the above?

Most of what you've written honestly isn't exactly correct. But search engines aren't perfect, and humans can and do have a role to play.

> I resent the fact that I have to create all this content describing my content--the search engines should be creating this "metadata."

You don't -- they do, they have been.

> In the world of enterprise search, companies such as FAST, Vivisimo, Autonomy, etc, have to find information without the benefit of aids. Corporate documents have no pagerank or tags or much metadata of any kind.

Corporate documents can and do have metadata.


(Disclosure: I'm an engineer at Google.)

Tom, if I squint my eyes a bit, it seems like the crux of your argument is this line:

"I resent the fact that I have to create all this content describing my content--the search engines should be creating this 'metadata.' "

Here's my advice: take a month off from making all those metadata keywords. At the end of the month, see how your traffic looks. My guess is that your Google traffic won't go down at all. If your (say) Technorati traffic went down, then you can decide whether it's worth the hassle of writing those extra meta keywords.

But just as an example for this article, you selected the tags "search, GOOG, Technorati, search is broken, Foremski". If you look back over your article, every single one of those keywords (except GOOG) is well-represented in your post already. So in this case, you could have saved yourself the trouble of adding that metadata and you still would have been fine.

I completely understand not wanting to select tags for each post; I don't bother with them on my blog either. But on the other hand, many of the things that you mention are optional stuff that other people enjoy taking advantage of (e.g. Google's custom search engines). But search will work just fine for you if you don't want to participate in those extra options.

I think it would be neat if you took a month off from adding tags, just to see if it affected your traffic at all.


Yes broken but not for the reasons suggested here. Seems to me the problems rest with the onslaught of spamming and bogus sites combined with the search engine's reponse to this which often leaves many fair to good content pages indexed poorly.


Hals:

I agree with Joe Duck. Lots of auto-generated spam sites now pollute the top ranked search results. Some VC should fund a search engine which filters out the spam sites by avoiding pages with prominent AdSense or other ads. (Don't worry -- Google wouldn't copy this idea!) Such a search engine would find original, unbiased content and could be the "Consumer Reports" portal to the Web.


Very good points. But it is a fact that enterprise search deals with less complexity, so requires less help. This need for help is a result of high competition, the value that SEO can add up to your site.


This article reminded me of a book I read recently called Ambient Findability, by Peter Morville. Part of the book describes why we’ll probably never know just how accurate search engines like Google are due to the large data sets involved, and yet we trust it with our day-to-day searching all the time.


"Lucky" button is very useful, if I've got a search term that I know will bring up my desired page as the top hit. Just type it into the Location bar in Firefox and you get an automatic redirect. (Not the Search bar, the Location bar.) Depends on the search term.

For the individual points, it sounds like you're trying to counter the machinations of other people. I'm not sure whether you're trying to be found on specific terms, or general terms, though... could make a big difference. (eg, "flowers" vs "roses delivery francisco" or whatever.)


Yet in consumer search it seems as if nothing would be found without a huge amount of help from millions of people every day. Why is it that we have to help the search engines do a job they are supposed to be doing by themselves?

Because of spammers and the like trying to game the system. Did you just get here?


Google very often regard articles with good content as over-optimized, because it takes money away from Google AdWords, and gives them a very hard "end of results penalty". Instead Google promotes Goobage, such as old or irrelevant eBay links, fooling users to click on the AdWords sponsored links in desperation.


Tom Foremski:

Jim: If Google were to do that it would be sabotaging the Google News service, it must be due to a poorly designed algorithm (which is also sabotaging the service...!)

Your point does bring up a possible scenario: Would GOOG tweak its search to favor Google AdSense sites in order to meet a shortfall in financial quarter expectations? At the expense of the user experience?


For commercial keywords more than 60% of the visible info, using 1280x1024 resolution, on a search result page comes from about 11 AdWords sponsored links. The remaining 6-7 visible natural links often comes from stores cooperating with Google. Google is already heavily tweaked against user experience.


I wonder what MyLiveSearch is going to add into the search world


Post a comment