[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

All of the WWW Available **Forever**

  True to my name handle, I'd like to alert y'all to the truly
  Xanadudlian mission of the start-up Internet Archive and Alexa 
  companies, the former a non-profit effort to continuously

      s t o r e  ALL OF (unrestricted-access) WWW pages FOREVER ;

  the second a commercial outfit developing tools to browse and
  reuse such cumulative/ multi-generation archive contents. 

  Acc. to their owner Brewster Kahle --formerly of the Thinking
  Machines Corp., and a father of WAIS-- one of the target functions
  of Alexa-derived software is to be a `"reliability service" that
  will resurrect dead links.  Give the URL and an approximate date
  to the Archive, and it will dig up the document.'.....  rings a
  bell, doesn't it?

  The Alexa archives are made of successive sweep-n-suck (BIIIG 
  sucks, too) sessions of the entire WWW dataspace resulting in 
  consecutive "frozen Webs" stored at one location -- currently
  a warehouse in SF; ultimately in the digital storage facility of 
  the US National Archives in Washington, D.C.  Treating an entire
  docuverse as a collection of "barts" (or "stamps", I keep mixing
  them up) may sound like a bit of overkill, but whoever said that
  the (yellow brick) road to Xanadu must be straight and narrow?


Based on Paul Bissex' article at:

>           [...] whereas keyword search engines [AltaVista etc]
>           store an index to the Web, the Archive consists of a 
>           copy of the Web itself. Kahle estimates the current 
>           size of the Web at about two terabytes (that's two
>           million megabytes). Having completed two full sweeps 
>           of the Web, the Archive now contains about four 
>           terabytes of data. A recent upgrade of the Archive's 
>           connection from two T1 lines to a full T3 brings 
>           a welcome 15-fold increase in bandwidth, meaning 
>           that future Web "snapshots" will be conducted much 
>           faster than the first two. With some researchers 
>           estimating the average life of a Web page at 75 days, 
>           speed matters.