How did Google collect such a thorough knowledge of what a person may be looking for when he or she types in a small phrase to a Google web search? Especially when you consider what little knowledge Google has of that specific person. I mean the person is completely anonymous, the only thing that is leaked is the computer’s IP address, and this isn’t exactly the most revealing piece of information, it might give some clues about the computer’s location and that’s it.
The answer to how they do this is through an algorithm, more specifically the PageRank algorithm. In case you don’t know what an algorithm is, it’s a sequence or formula for how to solve a problem, and in Google’s web search, they were and still are solving a very big problem, in fact possibly the biggest problem ever solved: what exactly are people searching for when they type in a small phrase in a search engine.
The single reason why Google blew away its competition in the search engine wars with the other giants (Yahoo, Ask Jeeves, MSN, or anything for that matter) was because of this clever algorithm they came up with for what people are looking for when they type in a phrase. Before Google, search engines like Yahoo would have an army of “web content experts” to scour the web for what they viewed as first rate content and plug it in to the top of their web search results, assuming that their qualitative, soft approach to the problem would deliver the highest rate content for their web searchers. Anything outside of their “best of the web” categories would be seen to them as lower grade, and was pushed to the bottom of the results, in order of hit counts. This method became impractical just based on the rate at which the web started to grow, the growth in information out there was exponential. No team of people could sift through the web quick enough to find the best of everything, it was essential for the system to be automated. The failure of these first web searches attempts to create an effective web search allowed for an open door for anyone who wished to tackle this problem more effectively, so this is where Google came in.
The Google engine starts out as a “spider” or “web crawler” which scours the web automatically, viewing all websites, storing key information and indexing them accordingly. The more information it collects, the more effective the indexing system. This crawler would then store all of this information onto Google’s servers as a copy of the useful information, and thereby determine which page deserves the highest “page ranking” on the web search.
The key determination of this page ranking is not by hit counts, or by a team of Google employees who choose their own “best of the web”, but it was by a system of links. Because if someone else out there on the web provides a link to your website, they are in fact providing a recommendation to your site. And if a very powerful website is the one providing this link, or recommendation, to your site, then this provides substantially more of an improvement to your website ranking. The opposite is true as well, if a rarely viewed and insignificant site decides to link to your site, it will have very little effect on your page ranking. This chain of links works endlessly in succession as well, because in order to provide a powerful link authority on one website, you must have been linked by other powerful websites in the past, and they were linked by other powerful sites, and so on and so forth.
This PageRank algoritm was not enough on its own to shoot Google to the top of the web search world. Their goal from the very start was “to organize all the world’s information.” So to do this, they had to store incredible amounts of data into their servers. They created server farms to do this, multiple facilities each the size of airplane hangars to store all of this data. This is all in the name of creating more and more effective web searches. The reasoning behind all of this collection? Simply because the more information they collected, the more effective their PageRank indexing became, and the more people would use their web queries to get what they are looking for, and the more info they could collect about web searchers to make their indexing more effective. Really it was quite the favourable cycle for Google. Especially when you consider that the algorithm Google used was 100% automated, no human intervention whatsoever, so in terms of scalability, the only thing stopping them from taking over the world of web searching was more and faster machines.
So this is how Google’s web search works, you yourself may be wondering how to improve the ranking of your very own website, a term called Search Engine Optimization, or SEO, i’ll discuss this in a future post.
As well, you may be wondering how does Google profit from all of this? How can they afford to create these server farms and collect all of this information? Its not like we pay Google to use their service. I will also discuss this in a future post.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.

