Google 1 Yahoo 0WhatsTheBigData

Google_Yahoo Many of the obituaries for Yahoo have contrasted its demise with the flourishing of Google, another Web pioneer. Why was Google’s attempt to “organize all the world’s information” vastly more successful than Yahoo’s? The short answer: Because Google did not organize the world’s information. Google got the true spirit of the Web, as it was invented by Tim Berners-Lee.

In his book Weaving the Web, Tim Berners-Lee writes:

I was excited about escaping from the straightjacket of hierarchical documentation systems…. By being able to reference everything with equal ease, the web could also represent associations between things that might seem unrelated but for some reason did actually share a relationship. This is something the brain can do easily, spontaneously. … The research community has used links between paper documents for ages: Tables of content, indexes, bibliographies and reference sections… On the Web… scientists could escape from the sequential organization of each paper and bibliography, to pick and choose a path of references that served their own interest.

With this one imaginative leap, Berners-Lee moved beyond a major stumbling block for all previous information retrieval systems: The pre-defined classification system at their core. This insight was so counter-intuitive that even during the early years of the Web, attempts were made to do just that: To classify (and organize in pre-defined taxonomies) all the information on the Web.

Google’s founders were the first to seize on Berners-Lee’s insight and build their information retrieval business on tracking closely cross-references (i.e., links between pages) as they were happening and correlate relevance with quantity of cross-references (i.e., popularity of pages as judged by how many other pages linked to them). This was what set Google apart from its competitors, including Yahoo. Having a so-called “first-mover advantage” (yet another example that there are no universal “business laws”), Yahoo worked hard and employed many people in organizing in a neat taxonomy the rapidly-growing content of the Web. It even had a Chief Ontologist on staff.

Danny Sullivan in 2010:

Google’s ranking system gave you the best of both worlds. Yahoo was a card-catalog of the web, letting you effectively search for the right “books” based on what they were titled. Google’s system let you search through all the pages of all the books in the entire library. It was far more comprehensive, plus it still managed to get good stuff to the top of the list.

Berners-Lee’s insight is frequently linked to Vannevar Bush who wrote in 1945, “Our ineptitude at getting at the record is largely caused by the artificiality of systems of indexing… Selection [i.e., information retrieval] by association, rather than by indexing may yet be mechanized.” But I prefer to start the history of the Web (and organizing information) with what was, to my knowledge, the earliest use of cross-references.

This was Ephraim Chambers’ Cyclopaedia, published in London in 1728. While lacking the worldwide platform for “crowd-sourcing” references that Berners-Lee invented, Chambers shared with him (and Bush) a dislike for hierarchical, alphabetical, indexing systems. Here’s how Chambers explained in the Preface his innovative system of cross-references:

Former lexicographers have not attempted anything like Structure in their Works; nor seem to have been aware that a dictionary was in some measure capable of the Advantages of a continued Discourse. Accordingly, we see nothing like a Whole in what they have done…. This we endeavoured to attain, by considering the several Matters [i.e., topics] not only absolutely and independently, as to what they are in themselves; but also relatively, or as they respect each other. They are both treated as so many Wholes, and so many Parts of some greater Whole; their Connexion with which is pointed out by a Reference. So that by a Course of References, from Generals to Particulars; from Premises to Conclusions; from a Cause to Effect; and vice versa, i.e., in one word, from more to less complex, and from less to more: A Communication is opened between the several parts of the Work; and the several Articles are in some measure replaced in their natural Order of Science, out of which the Technical or Alphabetical one had remov’d them.

Chambers’ Cyclopaedia was the earliest attempt to link by association all the articles in an Encyclopedia or, in more general terms, of everything we know at a given point in time. And like the World Wide Web, it moved some people to voice their concern about what Google is doing to our brains. The supplement to the 1758 edition of the Cyclopaedia says:

Some few however condemn the use of all such dictionaries, on the first pretence, that, by lessening the difficulties of attaining knowledge, they abate our diligence in the pursuit of it; and by dazzling our eyes with superficial shew, seduce us from digging solid riches in the mine itself.

The fear of what tools for organizing information could do to our thinking (and livelihood) was renewed many-fold with the advent of modern computers. “They can’t build a machine to do our job; there are too many cross-references in this place,” says the head librarian (Katharine Hepburn) to her anxious colleagues in the research department when a “methods engineer” (Spencer Tracy) is hired to “improve workman-hour relationship” in a large corporation. By the end of the film, Desk Set (released in 1957), she proves her point by winning, not only the engineer’s heart, but also a contest with the ominous looking “Electronic Brain” (aka Computer).

Automation—replacing librarians and their card catalogues—has been at the heart of Google’s success and obsession with “scale” (and “at scale” has become an obsession for Silicon Valley). But this automation has led to augmentation, to supporting our thinking by creating a new way to organize the world’s information, one that is more in line with our thought process and more in line with the impossible-to-catalogue current volume of (valuable and useless) information. As Vannevar Bush wrote:

The human mind… operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain … One cannot hope to equal the speed and flexibility with which the mind follows an associative trail, but it should be possible to beat the mind decisively in regard to the permanence and clarity of the items resurrected from storage.

Originally published on Forbes.com

Google 1 Yahoo 0

About GilPress

Leave a Reply Cancel reply

Categories

Archives