Below, I’ll cover the varying ways these come together, and how a website ends up at the top of the search when you perform one.
The Basics of Crawling
A company such as Google has programs that continually “crawl” the web looking at websites. This program is usually referred to as a spider or crawler. The crawler will find a website and look at everything on it in only the time that a computer can view a website. Incredibly fast, in other words.
This website is then analyzed for keywords and phrases, links to other websites, and now-a-days, where those links are placed and how visible. The crawlers also tend to store a copy of the website, and then move outward onto the links it found. This process repeats for each link found, over and over again. The never ending process is constantly refining which website tends to have the most links pointing at it containing the same combination of words and phrases.
As the amount of information continues to expand on the internet, these crawler’s capabilities do as well. At this point, they even take into account the layout of pages, placement of any and all links, and have become fairly decent at flagging for attempts at manipulating the crawler. One such example his hiding links and key words in a website’s metadata, a trick that used to be used to have a crawler think your website was laden with more links and words than us mortals actually saw.
Google’s (and the others) second step is indexing all of the information the crawlers obtain. Imagine a giant contact list filled out with a myriad of information gleamed by these crawlers.
Honestly, this is one of the stranger parts for search engines. All of that internet data you are googling is stored in a physical hard drive in a physical location. Essentially, all of this data is stored in these datacenters that hold on to the information until a crawler drops by with updated tidbits, in which case the index is updated. What’s stranger is that these very large physical internet holding locations have the same data copied in each place. A person’s proximity to one of these data centers can minutely effect the speed at which your chosen search engine spews out results.
These indexes are what is accessed when a person performs a search. The data has been sorted and stored at this point, and a query on google is similar to asking a librarian for a certain book, or a friend for directions.
Your keyword or phrase that was entered will be matched to a variety of different, indexed websites. In turn, the websites will be ranked based on the number of reputable websites that point towards a website with similar and/or the same words you searched. If a reputable website has a lot of links going into it from others, and going out to similarly reputable websites, and finally has the key words you asked for, it will be ranked very high on the list.
This is the part you see when page after page of results pops up mere seconds after performing a search. Each website has been carefully examined by a crawler at some time in the recent past, indexed at one of these data centers, and then spit out to you from billions of options based on the algorithms that a search engine uses to determine the website link’s quality.
Depending on which search engine is used, this is wear results differ. None of these algorithms or known in their entirety by any of us mortals not involved in the creation of them. Each search engine, therefore, has a differing set of criteria in how they rank a website. For the most part, they use the same formula described earlier. Sometimes though the rankings will be skewed in different variations depending on what qualifies as a good and reputable site versus one that isn’t.
All in all, these search engines are decently simple. To keep up with the webs growth, Google and these other companies are constantly building better and more sophisticated crawlers, larger datacenters, all to give you faster and more accurate retrieval times on anything you search for.