[olug] Google on Linux

Mike Hostetler thehaas at binary.net
Fri Jun 20 20:18:07 UTC 2003


I'll answer with what I know

On Fri, Jun 20, 2003 at 03:11:58PM -0500, Joe Catanzaro wrote:
> This isn't very related to Linux, but you've probably noticed that when you 
> search for something on Google there's a link for a cached copy of the 
> returned link? So my question is, when the Google spiders/robots crawl the 
> web, do they make a copy of the result to Google's server farm, and if so, 
> does that mean that Google probably has a backup copy of the entire web? 
> And how many terabytes is that?

I dunno the size, but Google does have a copy of the text part of all
sites that their crawler hits -- but not the images (The images scaled
down and cached via images,google.com, but they do not appear under the
cached page).   I think cached pages are one of Google's best feature.

> Also, has anyone bought/read O'reilly's Google Hacks? Any good?

I've read it via my Safari subscription (which I would highly recommend).
I learned quite a bit about special syntaxs, etc. Some of it is
elementary, some of it good.  I would skim through it first before buying
the dead tree version of it.

> Several years ago (I think) Google had one of the largest Linux clusters at 
> around 8000 boxes. Do they still have the largest? If not, who does?

That was true, and they probably have a bigger farm than that now, but I
dunno if it's stll the biggest.

-- mikeh



More information about the OLUG mailing list