How search algorithms power search engines
Search engines like Google, Bing, and Brave Search put the entire Web at your fingertips. Whether you’re researching a new topic, looking for a local business, or just can’t remember the name of that one movie, chances are you start seeking your answer with a search engine.
The key innovation of search engines is that they help you find a webpage without knowing its URL. But how exactly do search engines work? In this article, we’ll cover the basics of indexing, algorithms, and more.
Before search engines: the early Web and Web directories
Today there are billions of websites, so having a tool to easily sift through them is a must. But in the Jurassic days of the Web (i.e. the early 1990s), things were different. There were far fewer sites, and to visit them you simply needed to know the page’s address (or URL). Back then, sites were mostly accessed via links from other webpages, and gained popularity from word of mouth.
Eventually, people created directories to make the Web easier to navigate. Early directories—like Yahoo! and the Open Directory Project (DMOZ)—aimed to list, categorize, and quality-check the information on the Web. For example, if you wanted to research sea turtles you’d visit a directory like Yahoo!, find its “science” category (and maybe a subcategory like “marine biology”), and then look for websites about oceans or marine life. You just had to hope the site you arrived on had information about sea turtles.
Amazingly, these directories were maintained manually…by real people.
The origin of search engines and keyword matching
As the number of websites grew from hundreds to thousands to millions, directories became impossible to maintain. Search engines arose in the 1990s to automate the work of Web directories.
The earliest search engines mainly used basic keyword matching to produce results; they didn’t use advanced algorithms like today’s search engines. A search for “sea turtles” would produce results for webpages that contained the phrase “sea turtles.” Most search engines would rank results based on keyword prevalence, which isn’t always a good indicator of page quality.
Over time, it became clear that keyword matching alone wasn’t enough to produce high-quality search results—so search engines needed new ways to rank pages.
How search engines work today: crawling, indexing, and ranking
Today, most search engines work roughly the same way. They:
- Find pages on the Web by “crawling” a huge number of websites.
- Process (or “index”) those pages in a kind of massive list.
- Use a “ranking” system to assign the pages relevance or authority.
Crawling: finding new pages on the Web
Crawling is how search engines find new pages on the Web. Search engines use programs called Web crawlers (or just “crawlers” or “spiders” for short) to scan the Web for new sites, or new pages on sites they already know about. The crawler follows links from site to site, downloading the HTML code and other page content (e.g. links and metadata) on every page they find.
Indexing: analyzing page content
Once a page is crawled it gets indexed—this is how search engines process and store page content. A search index is essentially a massive database of all the pages a search engine has found, some info about those pages (known as “metadata”), and the content of those pages. Pages are categorized according to their topic and purpose, and added to the index so the search engine can quickly retrieve the page in response to search queries.
Ranking: determining page relevance
Finally, the search engine will rank indexed pages by their relevance and significance. While crawling and indexing are generally the same no matter which engine you use, modern search engines vary widely in how they rank pages. This proprietary (even secretive) code that determines page ranking is called the ranking algorithm.
It’s this ranking system that determines what pages appear—and in what order—in response to your search query.
Note: Search engines like Google, Bing, and Brave Search do not crawl or index every page or even every site. For example, pages on the so-called “dark web” cannot be found in a traditional search engine.
The evolution of search engine ranking algorithms
The earliest ranking algorithms were based on keyword frequency (i.e. how many times a given word or phrase appeared in the page’s content). But this led to obvious “keyword stacking” problems, where sites would flood their pages with repeated words, in an effort to improve ranking. Search engine makers quickly saw this ploy, and got more sophisticated in response.
Google, for example, began using link analysis to help inform its ranking. If a website is linked to by lots of other sites, then its ranking improves. Google’s algorithm also considers the quality of links: If a site is commonly linked to by “authoritative” sources, or cited in academic research, then that site’s influence is weighted more heavily.
Today, search engine ranking relies on much more than keyword matching and link analysis. Search engines have evolved to use highly advanced—and potentially problematic—algorithms, becoming both more technically advanced, and more “personalized” (thanks to the user data they collect).
The technical aspects of search ranking algorithms
Search results are now often ranked according to many technical factors, like:
- Relevance and quality of content: keyword usage, titles, section headers, and so-called “meta” tags of information about the page itself
- Recency: the date a page was published or last updated
- Link structure: the amount and quality of links into a page from other sites
- Page layout, design, loading speed, and user experience: how easy a site is to use, especially on different devices
- Domain authority: the age of a website’s address—or “domain”—name, and its overall reputation
- Social signals: the amount of likes, shares, and comments a page receives on social platforms
Search algorithms use (and strategically weight) these factors to provide high-quality search results. But the algorithms are also generally kept private, because knowing the criteria that impact ranking would open the door for manipulation.
The impact of SEO on search ranking
The digital marketing field known as search engine optimization (SEO) basically means strategies to increase search ranking. Many SEO strategies are well intentioned attempts to make a website easier to use (like using clear headings, and making pages load faster), but others are seen as scams—like stuffing a site full of keywords.
Search engines ultimately aim to offer the best results without undue influence from sites, so they continuously tweak their algorithms. This makes it hard to know which factors contribute to search result rankings, and how each is weighted. Generally, a search engine is a “black box” where your queries generate search results, but you don’t actually know exactly how.
Personalization: how search engines use your data to rank results just for you
Aside from the technical factors that inform rankings, most search engines also track and collect user data (via cookies, trackers, and other means) to personalize search results for each individual. Some “personalization” factors that can affect ranking include:
- Device information (like language, location, and device type)
- Search history (what you search)
- Click history (what results you click on)
- Browsing history (the sites you visit)
Your data is used to create a profile of you, and this profile is then used to customize your search results. That means two people with the exact same search query can see very different results.
While some accept this tradeoff—data collection (and its associated privacy and security risks) in exchange for “better” (or more personalized) search results—it introduces problems most people don’t consider. Personalization happens behind the scenes, and no one knows exactly what data is collected, how securely (or not) it’s stored, how it’s used (or even sold), or how it’s applied to change search results.
While personalizing a relatively innocuous query like “best pizza near me” can seem low-risk and even convenient (no wading through results of pizza places in other cities), others can be more problematic. Political queries can open the door for algorithmic bias. Personal queries (about a medical issue, for example) can create privacy concerns. And so on.
Search engines turned ad platforms, and the impact on results ranking
The introduction of advertising is another major change to how modern search engines work. Early on, search engines were ad-free; today, a provider like Google is as much an ad platform as a search engines. This advertising—and its corresponding ad revenue—can also impact results ranking.
Google, for example, offers some of the most highly targeted ad space available. Want to advertise your product to women between the ages of 25–30 who live in the Pacific Northwest, have a pet, and enjoy camping? That’s easy with Big Tech search ad platforms. And it’s all made possible by the massive amounts of user data they collect.
And while search ads can help cover the costs of running a Web index, some search engines are conflicted by their ad business. Big Tech players like Google are incentivized to display ads above organic results. Their search ads also most often rely on the collection of personal data, which is made available to advertisers to better target users.
There are, however, some alternatives. Brave Search, for example, is an ad-supported search engine that does not track users, their queries, or their clicks. Brave Search ads are matched to the query, not user data. And these ads don’t crowd-out organic search results. It’s a user-first advertising system that respects your privacy, and a stark contrast to the Big Tech search engines powered by personal data.
A return to the original purpose of search engines: surfacing information
Long before search engines became de facto ad platforms—and tracking or profiling users became the norm—the sole goal was to connect users to the information they sought. As efficiently, effectively, and objectively as possible. For Google, those days are long gone.
But a new breed of privacy-first search engines are ushering in a new vision for private search. And while many of these private options still rely on Big Tech for their indexes and results ranking, there’s one alternative that’s both private and independent: Brave Search.
Brave Search delivers results from its own index of the Web. And it doesn’t track users, their queries, or their clicks. That makes Brave Search free from behind-the-scenes personalization of results. And while Brave Search is ad supported, those ads are anonymous by design, and consistent with Brave’s commitment to ethical and transparent advertising practices.
And there’s clearly a demand: It’s the fastest growing search engine since Bing. Try Brave Search today.
In this article, we'll introduce the basics of search engine advertising. What search ads are, how they work, and how Big Tech options like Google use search ads to collect your data. We'll also discuss private alternatives like Brave Search ads.Read this article →
More and more people are switching to private search engines like Brave Search, and away from Big Tech options like Google. But how do these private options affect advertisers? Can private search show useful ads, protect users, and support the continued operation of the search engine itself? In this article, a discussion of how private search engines impact ads and advertisers.Read this article →
Ready for a better Internet?
Brave’s easy-to-use browser blocks ads by default, making the Web cleaner, faster, and safer for people all over the world.Download Brave