Leaked Documents Provide Glimpse Into Google's Search Secrets

Leaked documents have provided a window into one of the web's most closely guarded secrets, Google's search ranking algorithm.

For well over a decade, content publishers have tried to second guess how Google Search ranks articles. Positioning in search results is crucial, since the vast majority of searchers only look at the first page. Top ranked articles on Google Search, which enjoys a near-monopoly in many regions, are vastly more likely to be viewed that those lower in the pecking order.

The black-box nature of the algorithms has spawned a whole industry of SEO experts, who purport to be a step ahead of the rest in understanding Google's secret recipe, based on announcements from its employees as well as their own research.

Now a leak of 2,500 pages of internal documents by a Google Search insider appears to provide more information about how it works, suggesting Google has been less than truthful in some of its public pronouncements.

The documents were sent to former SEO analyst and current tech executive and blogger Rand Fishkin, by an anonymous source. This source later outed himself as Erfan Azimi, now founder and CEO of digital marketing agency EA Eagle Digital via a video on YouTube.

The documents, which Azimi claims originate from Google's API Content Warehouse on GitHub, were confirmed as looking "legit" by ex-Google employees contacted by Fishkin, and by Mike King, SEO expert and founder of iPullRank, who put together his own analysis.

"Google spokespeople have gone out their way to misdirect and mislead us on a variety of aspects of how their systems operate in an effort to control how we behave as SEOs," said King.

The documents reveal in some detail the sort of information Google collects. While they do not describe exactly how this information feeds into search rankings, the collection of this data and some of the context provided by the leaked documents seems to contradict what some Google employees have stated.

"Over a decade we've been lied to," said Azimi. In his video, he recalls internal conversations where he was told that click data (information about other all sites visited by the browser), data from Chrome, clickthrough rate (CTR) and bounce data are not used to rank results, whereas in fact they do play a part; they are also vulnerable to manipulation by bots.

Azami insists he is not sharing the leaked documents (the primary source is not known) for his own gain, but because he is annoyed at what he sees as deception on Google's part, and so that "webmasters and business owners can make better decisions on optimizing for business search."

The leak comes as the U.S. government's antitrust case against Google is working its way through the courts, in the process revealing more information about the workings of Google Search.

AI's Growing Role In Misinformation

Another Google-related document, this one public, has shown how AI has rapidly become a major source of misinformation and disinformation, including deepfakes and false war footage.

According to a report by Google researchers, academics and fact-checking organizations, AI-generated images and videos are growing rapidly as a means of spreading falsehoods.

In part, that's because information shared online has become "more media-heavy" over the years, with purveyors of misinformation taking advantage, thanks to the emergence of easy-to-use genAI.

"The rise of generative AI-based tools, which provide widely accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns," the researchers write.

Images are the most prominent source of misinformation.

"Images are ... regarded as highly persuasive and effective means of messaging, including when the communication is effectively a lie," the researchers note. From virtually one before 2023, AI-generated images are now almost as prevalent in fact-check enquiries as pictures edited by traditional means.

However, video is advancing at a similar rate, as AI-generated clips become easier to create and disseminate.

Rather than displacing other forms of misinformation, AI is adding to it, the researchers conclude.

"The sudden prominence of AI-generated content in fact checked misinformation claims suggests a rapidly changing landscape. However, this is occurring alongside already-existing forms of media-based misinformation: the existence of AI-generated content does imply other forms are less effective or less widely used."

This article originally appeared on our sister site Computing.