How much Internet bandwidth is used by spiderbots from GOOG, YHOO, MSFT et al...?

By Tom Foremski - July 10, 2006

Google, Yahoo, MSN and the many other search sites and aggregators wandering the Internet with their spiderbots could be in trouble, if their version of net neutrality doesn't survive.

That's because the spiderbots eat up a huge amount of bandwidth, and if bandwidth gets more expensive, the spiderbots are going to suffer. I get 5 per cent of my traffic from more than 18 spiderbots, as they scour the Internet copying everything in their path. They use up about one-third of my bandwidth.

That's a key reason why Google, Yahoo and others, are arguing for everyone to have equal access to bandwidth--at least the last mile pipe to the home--the most important pipe.

If companies are going to have to pay extra to the telcos or cable companies for bandwidth to reach their users, they might not be so pleased to be paying for the bandwidth of the swarms of spiderbots.

I'm fortunate that more than 92 per cent of my readers come directly through bookmarks or RSS, so they know where I live. Many sites depend on 30 per cent to 60 plus per cent of their traffic from the search engines.

And they spend a lot of money to optimize their sites to attract more search engine traffic. But often, this is not quality traffic, it is fly-by-night web surfers.

Web sites should optimize for their readers, not the spiderbots. Let the search engines optimize themselves, that's their job.

If the telcos/cable companies get away with raising fees from the many online companies, to guarantee they have the bandwidth to reach their potential customers, then the spiderbots will be in trouble.

When audience numbers stabilize for a web site, and very few new readers come in from the search sites--yet the spiderbots suck up one third of the bandwidth--then things will change. More and more web sites will be posting a Robot.txt file that tells the spiderbots to go away. They will change because the overall visitor experience is slowed down by the bandwidth hungry packs of spiderbots.

We used to have estimates of how much bandwidth is consumed by email, by SPAM, etc, how about spiderbots? Does anybdy have access to such data?

I would love to know: how much Internet bandwidth is used up by the legions of spiderbots, in their constant search to find and copy new content.


« This&That: Green envy: the Tesla sports car is coming; 3VR and robot assassins from the future; good review for Long Tail book. | Main | The lifespan of ideas ... »


                   

July 10, 2006 | Permalink | Comment | Category: | Subscribe to SVW

Comments (5)

Hm... Interesting question. If Google does suck so much into its cache, how much bandwidth does it consume? Should Google be charged extra for bandwidth used by Web Accelerator?


Andrew, I think using cache tends to lessen bandwidth load...but GOOG can afford to pay for the bandwidth it is the smaller Web 2.0 competitors that won't be able to pay--at least on the scale of Google. Will a chunk of VC investments go to the telco/cable companies to make sure young companies get the bandwidth they need? It'll be interesting...


Hendrik Rood:

You seems to be quite confused in your reasoning as far as I can see. You mix up the existing charging model for traffic between datacenters over the Internet backbone [charging all backbone customers by volume sent/received], with a different model currently very popular on the retail consumer market "flat rate access".

Volume based pricing is the norm for accessing internet backbones. Both your website hoster and companies like Yahoo, Google and Amazon.com pay per volume (actually they pay for the 95% peak level of the bandwidth they send/receive per month, the 5% highest 5 minute measurement intervals are discarded).

In consumer retail a "flat rate access" model has proven to be more popular than "pay-as-you-go" schemes, which typically cater for the very cost conscious and the lowest income levels.

Experimental economic research led by Hal Varian of UCBerkely (the INDEX project) and a.o. published a.o. by the American Enterprise Institute / Brookings Joint Center for Regulatory Studies demonstrates that consumers, given a choice in paying schemes prefer "flat rate access" above "volume based pay-as-you-go", even when pay-as-you-go would result in a lower price than "flat rate access".
[http://www.sims.berkeley.edu/~hal/Papers/brookings/brookings.html or http://books.google.com/books?vid=ISBN0815715919]


The simple gist of net neutrality is that companies operating access networks have been used to a model where they charge metered rates to businesses and flat rates to consumers, while collecting metered (volume based) wholesale termination fees from long distance operators.

The RBOCs just want to restore that business model for Internet traffic and reverse the tables where they have to outpay per volume to Tier 1 backbone operators.

Allowing RBOCs to collect volume fees is just a shift in the model, as this practice is already dominant on the Tier 1 backbone, they could also reach that situation by acquiring Tier 1 backbones to the point that they have gained sufficient market power to raise the prices (or more realistic: not lowering them anymore and collecting a doubling of revenue due to the annual doubling of traffic volume).

Off course such a takeover run, would bring steep stock price increases for the last independent Tier 1 backbones, so it is an expensive strategy. It would probably also run into all kinds of M&A approval investigations.

Getting congress to approve that they are allowed to charge for bandwidth consumption or create separate pipes per service is a way to avoid the potential antitrust risks of the market based solution.

This is it, nothing more and nothing less. Net neutrality is simply an antitrust debate in disguise. The US Congress has just answer one question: is it smart to shift the locus in the telecommunications networks of 'operators who collect from both sides' from the Internet backbone owners to the access network owners, in particular when entry on the market for Internet backbones is easier today.

The rest is fuzz to confuse laymen to understand what it is all about.


Thanks Hendrik, for an interesting perspective on the net neutrality debate...


Richard Koman:

I think actually, it's quite simple. NN is an antitrust play. The FCC redefined telcos and cables as "information providers" not common carriers. Now that they are free they are ready to build out next-gen infrastructure and they mean to do it with the premium service model for that infrastructure.

NN says, this is what happens when you don't have common carrier rules. NN rules would effectively put the genie partway back in the bottle. Telcos scream "regulation," because they just got out of escaping the "regulation" - that is, some 70+ years of responsibility under common carrier rules.


Post a comment