HUGE Google Search document leak reveals inner workings of ranking algorithm (2024)

A trove of leaked Google documents has given us an unprecedented look inside Google Search and revealed some of the most important elements Google uses to rank content.

What happened. Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot. These documents were shared with Rand Fishkin, SparkToro co-founder, earlier this month.

  • Read on to discover what we’ve learned from Fishkin, as well as Michael King, iPullRank CEO, who also reviewed and analyzed the documents (and plans to provide further analysis for Search Engine Land soon).

Why we care. We have been given a glimpse into how Google’s ranking algorithm may work, which is invaluable for SEOs who can understand what it all means. In 2023, we got an unprecedented look at Yandex Search ranking factors via a leak, which was one of the biggest stories of that year.

This Google document leak? It will likely be one of the biggest stories in the history of SEO and Google Search.

What’s inside. Here’s what we know about the internal documents, thanks to Fishkin and King:

  • Current: The documentation indicates this information is accurate as of March.
  • Ranking features: 2,596 modules are represented in the API documentation with 14,014 attributes.
  • Weighting: The documents did not specify how any of the ranking features are weighted –just that they exist.
  • Twiddlers: These are re-ranking functions that “can adjust the information retrieval score of a document or change the ranking of a document,” according to King.
  • Demotions: Content can be demoted for a variety of reasons, such as:
    • A link doesn’t match the target site.
    • SERP signals indicate user dissatisfaction.
    • Product reviews.
    • Location.
    • Exact match domains.
    • p*rn
  • Change history: Google apparently keeps a copy of every version of every page it has ever indexed. Meaning, Google can “remember” every change ever made to a page. However, Google only uses the last 20 changes of a URL when analyzing links.

Links matter. Shocking, I know. Link diversity and relevance remain key, the documents show. And PageRank is still very much alive within Google’s ranking features. PageRank for a website’s homepage is considered for every document.

  • This doesn’t prove Google spokespeople have lied about links not being a “top 3 ranking factor” or links mattering less for ranking. Two things can be true at once. Again, we don’t know how any of these features are weighted.

Successful clicks matter. This should not be a shocker, but if you want to rank well, you need to keep creating great content and user experiences, based on the documents. Google uses a variety of measurements, includingbadClicks, goodClicks, lastLongestClicks and unsquashedClicks.

Also, longer documents may get truncated, while shorter content gets a score (from 0-512) based on originality. Scores are also given to Your Money Your Life content, like health and news.

What does it all mean? According to King:

  • “[Y]ou need to drive moresuccessfulclicks using a broader set of queries and earn more link diversity if you want to continue to rank. Conceptually, it makes sense because a very strong piece of content will do that. A focus on driving more qualified traffic to a better user experience will send signals to Google that your page deserves to rank.”

Documents and testimony from the U.S. vs. Google antitrust trial confirmed that Google uses clicks in ranking – especially with its Navboost system, “one of the important signals” Google uses for ranking. See more from our coverage:

  • 7 must-see Google Search ranking documents in antitrust trial exhibits
  • How Google Search and ranking works, according to Google’s Pandu Nayak

Brand matters. Fishkin’s big takeaway? Brand matters more than anything else:

  • “If there was one universal piece of advice I had for marketers seeking to broadly improve their organic search rankings and traffic, it would be: ‘Build a notable, popular, well-recognized brand in your space, outside of Google search.'”

Entities matter. Authorship lives. Google stores author information associated with content and tries to determine whether an entity is the author of the document.

SiteAuthority: Google uses something called “siteAuthority”.

  • Google told us something like this existed in 2011, after the Panda update launched, stating publicly that “low qualitycontenton part of a site can impact a site’s ranking as a whole.”
  • However, Google has denied having a website authority score in the years since then.

Chrome data. A module called ChromeInTotal indicates that Google uses data from its Chrome browser for ranking.

Whitelists. A couple of modules indicate Google whitelist certain domains related to elections and COVID – isElectionAuthority and isCovidLocalAuthority. Though we’ve long known Google (and Bing) have “exception lists” when “specific algorithms inadvertently impact websites.”

Small sites. Another feature is smallPersonalSite – for a small personal site or blog. King speculated that Google could boost or demote such sites via a Twiddler. However, that remains an open question. Again, we don’t know for certain how much these features are weighted.

Other interesting findings. According to Google’s internal documents:

  • Freshness matters – Google looks at dates in the byline (bylineDate), URL (syntacticDate) and on-page content (semanticDate).
  • To determine whether a document is or isn’t a core topic of the website, Google vectorizes pages and sites, then compares the page embeddings (siteRadius) to the site embeddings (siteFocusScore).
  • Google stores domain registration information (RegistrationInfo).
  • Page titles still matter. Google has a feature called titlematchScore that is believed to measure how well a page title matches a query.
  • Google measures the average weighted font size of terms in documents (avgTermWeight) and anchor text.

The articles.

Update, May 29. Google provided a statement to Search Engine Land. Read our follow-up: Google responds to leak: Documentation lacks context.

Update, May 30. King has written a follow-up article for Search Engine Land:

  • How SEO moves forward with the Google Content Warehouse API leak
  • Join Mike King and I at SMX Advanced for a late-breaking session exploring the leak and its implications.Learn more here.

Dig deeper. Unpacking Google’s massive search documentation leak

Quick clarification. There is some dispute as to whether these documents were “leaked” or “discovered.” I’ve been told it’s likely the internal documents were accidentally included in a code review and pushed live from Google’s internal code base, where they were then discovered.

The source. Erfan Azimi, CEO and director of SEO for digital marketing agency EA Eagle Digital, posted a video, claiming responsibility for sharing the documents with Fishkin. Azimi is not employed by Google.

Dixon Jones, CEO of Inlinks, made the 14,000 Google Search variables searchable. Jones said this tool will tell you what things Google stores and what they are used for.

HUGE Google Search document leak reveals inner workings of ranking algorithm (2024)

FAQs

What is Google's algorithm for ranking search results called? ›

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results.

What are the 3 key ranking factors that Google uses in their algorithm? ›

Learn more below about the key factors that help determine which results are returned for your query:
  • Meaning.
  • Relevance.
  • Quality.
  • Usability.
  • Context.

How does Google determine the ranking of content? ›

Beyond looking at keywords, our systems also analyze if content is relevant to a query in other ways. We also use aggregated and anonymized interaction data to assess whether search results are relevant to queries. We transform that data into signals that help our machine-learned systems better estimate relevance.

What is the algorithm behind Google search engine? ›

What are Google search algorithms? Google's algorithms are complex mechanisms used to retrieve information from its search index and present the information to a given query. Algorithms sift through billions of pieces of content in Google's index, looking for phrases and keywords that match the query.

Which Google's algorithm is its system for ranking websites *? ›

Google's algorithm assesses various aspects, such as keyword usage, content quality, and user intent, to determine a page's relevance to a particular search query. Websites that effectively match user search intent are more likely to rank higher in Google's search results.

What is the Google algorithm update 2024? ›

The March 2024 Core Update Is Complete. Google recently announced that the March 2024 core update was completed on April 19, but did not actually announce its completion until a week later on April 26. In total, the update took 45 days to complete and caused tremendous volatility across the open web during its rollout.

How does Google decide what search results you really want? ›

The Google Search algorithm is a complex system Google uses to decide how pages will rank in the search results. The algorithm is believed to consider hundreds of factors. Content relevance, quality, and the user experience (UX) are among the most important ones (more on each of these below).

What is No 1 most important Google ranking factor? ›

Quality Content: The most important SEO factor. Google wants to show users high-quality, informative, and relevant content. Backlinks: Links from other websites to your website. They act like votes of confidence.

What is the most well known algorithm that Google has ever utilized? ›

The most famous Google algorithm is PageRank, a pre-query value that has no relationship to the search query.

What relies on manipulating Google's algorithm to improve ranking? ›

Black SEO relies on manipulating Google's algorithm to improve rankings.

What is the name of Google's algorithm? ›

Following Panda's introduction in 2011, Google issued the following algorithm updates:
  • 2012 - Venice. The Venice update launched local SEO. ...
  • 2012 - Penguin. ...
  • 2012 - Pirate. ...
  • 2013 - Hummingbird. ...
  • 2014 - HTTPS/SSL. ...
  • 2015 - Mobildeggion. ...
  • 2015 - RankBrain. ...
  • 2016 - Possum.
Jul 12, 2023

What improves Google ranking? ›

How to Improve Your Google Search Ranking in 10 Steps
  • Improve your website's user experience. ...
  • Write great content optimized for SEO. ...
  • Get more backlinks. ...
  • Improve your page speed. ...
  • Fix broken links. ...
  • Optimize your images. ...
  • Use H1 and H2 header tags. ...
  • Optimize for local search.
Aug 5, 2022

Is the Google algorithm secret? ›

Google's internal documents have been leaked on GitHub, revealing secret details about the company's search engine algorithms. The leaked documents contain data about factors influencing search results, which are key to digital marketing and search engine optimization efforts.

Who owns the Google search algorithm? ›

Who owns the patent to Google's original search algorithm? To find the answer, we used Google's search algorithm, and the answer is Stanford University. According to Quara user Tom McFarlane, "The invention was made by Larry Page while he was a graduate student at Stanford University.

What was the former name for Google? ›

They called this search engine Backrub. Soon after, Backrub was renamed Google (phew).

What is the Google ranking tool? ›

With Google rank checker you don't need to go up and down to locate your site in google search results. A free extension that provides real-time insights into your website's ranking on Google. After installing the extension, add your favorite website to the list.

Does Google still use RankBrain? ›

Google RankBrain is definitely still relevant in 2024. In fact, it's arguably more important than ever for anyone concerned with SEO and online visibility. In today's guide, you're going to learn everything you need to know about Google's RankBrain algorithm.

What's the most popular search engine in China? ›

Baidu. Baidu is the most popular search engine in China and can be compared to Google in the western world.

What is the Google Scholar ranking algorithm? ›

The most relevant results for the searched keywords will be listed first, in order of the author's ranking, the number of references that are linked to it and their relevance to other scholarly literature, and the ranking of the publication that the journal appears in.

What is the ranking algorithm for search? ›

The ranking algorithm uses the input data, such as the number of links to the webpage from other websites and the number of times the keyword appears on the page, to calculate the page's relevance score. The higher the relevance score, the higher the page is ranked in the search results.

What is a Google search result called? ›

A search engine results page, or SERP, is the page you see after entering a query into Google, Yahoo, or any other search engine. Each search engine's SERP design is different, but since Google is the most popular—holding over 80% of the market share—we'll focus on their features and algorithms.

What is Google rank predictor? ›

Google Page Rank Prediction

This tool attempts to determine a future Google PageRank value of a particular site. In other words, we try to determine the PageRank value of the URL you enter below, as it would be after the next Google PageRank update. The tools does several calculations to come to this figure.

Top Articles
Latest Posts
Article information

Author: Virgilio Hermann JD

Last Updated:

Views: 5855

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.