“Instant Indexing” Could be on The Horizon at Google, But Not With IndexNow
On June 26, I caught Googlebot fetching two pages immediately after I updated them, and the odds of this occurring by random chance are slim to none. Assuming this was not random, the only thing that would explain it is if Google is testing a change to its indexing systems. The following day I posted about my observation on LinkedIn, and in the comments, Kristine Schachinger reminded me that Google has been testing the IndexNow protocol.
What makes the situation more suspicious, is that the requests lined up with what I saw in my RankMath Instant Indexing history. RankMath’s Instant Indexing feature uses the IndexNow Protocol to automatically notify search engines when you publish, update, or remove new content. Below is a screenshot of Googlebot requests in my log file and the matching entries in my IndexNow API history.
If this was intentional on Google’s end, considering that they are not using IndexNow, then what’s the notification mechanism? The only thing I can think of, off the top of my head is RSS, but these were not posts. I also cannot find any records of these specific requests in my search console crawl stats report.
Google Has Been Silent Regarding IndexNow Since 2021
Google has been “testing” the IndexNow protocol for a while now. Exactly two years and eight months since the time I’m writing this. Frankly, I had forgotten all about the fact that Google was even testing it. The last significant coverage of this topic including a statement from Google was by Barry Schwartz in Search Engine Land on November 9, 2021. Barry reported that Google, following the footsteps of Microsoft Bing and Yandex, was testing the IndexNow protocol to enhance its sustainability efforts.
“We’re encouraged by work to make web crawling more efficient, and we will be testing the potential benefits of this protocol”
So why haven’t we heard anything about it since? Adopting IndexNow would likely require Google to significantly change to its Crawling and Indexing systems. Especially at Google’s scale, it would require rigorous testing over a prolonged period of time. So while it has been almost three years, I’m not at all surprised that we haven’t heard any updates.
The Pros and Cons of Google Adopting IndexNow
Arguments for adopting the protocol include crawl efficiency, faster content discovery, and better energy sustainability. However, such gains would not come without challenges. The arguments against doing so would not be a refute of those for it, but rather, that it could create spam vulnerabilities and scaling it could require significant changes to it’s core crawling and indexing systems.
IndexNow would create a lot of noise
If you have an IndexNow API key, plugins like RankMath will ping search engines whenever you hit publish, update, or trash a page or post. To cut through the noise you’d have to be selective.
For Google, I think it would make sense for some areas of the web, but not all. So something like IndexNow likely wouldn’t replace any existing crawling and indexing systems. Instead, I’d see it functioning as a complementary tool, particularly for prioritizing updates in freshness-sensitive sections.
Spam and Abuse Prevention
Google would have to verify that the protocol doesn’t open up new avenues for spam and manipulation of search results. Links, traffic, and site category will likely affect how Google responds to pings. In addition, I could see Google associating something to the effect of a trust score with IndexNow keys. Ping abuse would lead to lower trust and they’d eventually disregard pings from abusive domains.
A robust means of spam filtering would also be essential. IndexNow currently works with API keys, which should adequately control for unauthorized submissions as long as they’re secured. Other likely measures could include rate limiting, quality checks, cross referencing with regular crawl data, and even a tailored machine learning model. Careful testing and monitoring of IndexNow’s impact on spam levels is likely why it’s taken them so long.
Scalability
Other search engines that have implemented IndexNow, such as Bing and Yandex, face significantly fewer challenges in scaling this technology because they do not handle nearly as many queries as Google. IndexNow might present unique challenges for Google depending on the design of its existing systems, which are already quite good at understanding what to crawl and when.
Depending on the technical configuration of their systems, a significant increase in write operations from adopting IndexNow at Google’s scale could create bottlenecks and become very expensive quickly. It would take time to adjust systems to accommodate this without hindering search performance.
How Exactly Does IndexNow Benefit Crawling and Indexing?
You can basically think of it like a notification layer for a search engine’s indexing system. This allows a search engine to maintain a “real-time index” across a distributed system without the need for constant, resource-intensive crawling of the entire web. Here’s an example of how it would fit in the crawling and indexing process:
- Notification Layer:
- IndexNow acts a notification layer, where it informs the distributed system of changes in the content (eg. new pages, updates, removals, etc).
- It sends HTTP requests to specified endpoints that URL(s) have changed, which then triggers the re-crawling and updating process.
- Crawling and Fetching:
- After receiving notifications from IndexNow, the search engine’s crawling system is prompted to fetch the updated content from the notified URLs.
- This reduces the need for continuous polling and enables more efficient and timely updates to the index.
- Indexing:
- Once the new content is fetched, the system processes and updates the index accordingly.
- What happens next depends on the inner workings of the search engine. Typically, this would consist of document analysis. Terms are extracted from the new or updated documents and used to build index entries.
- After this is done, eventually sharding and distribution will occur. This is where the updated index entries are distributed across shards to ensure balanced and efficient querying.
- Sharding is like dividing a library into smaller, manageable sections. Instead of one huge pile, you now have many smaller collections eg. fiction, non-fiction, science, and history.
- Distribution involves spreading these sections across multiple machines at different locations. It’s like having many niche libraries that specialize in certain types of books.
What makes IndexNow special is that it allows sharding and distribution become more dynamic and responsive. The ability to know exactly which shards need updating makes crawling and indexing significantly more sustainable because it saves processing power, and in turn, energy.
Why Google Probably Won’t Adopt IndexNow
I doubt that Google will wind up joining Bing, Yandex, and the smaller search engines that have adopted the protocol. First, search engines that adopt the IndexNow protocol all agree to automatically share IndexNow submitted URLs with all other participating search engines. This is a requirement listed at the bottom of IndexNow’s Documentation page. Google is usually not one to share.
Second, to control for the many challenges that rolling out a system like IndexNow would present, it would make the most sense for Google to build something tailored to work with it’s existing crawling and indexing system.
Third, Google has a well-established history of developing their own unique technologies rather than adopting existing solutions.
When Google entered the mobile market, they chose to develop Android rather than adopt existing mobile operating systems. Thereby creating an open-source platform that they could still heavily influence.
Rather than adopting existing machine learning platforms, they developed and open-sourced TensorFlow. This allowed them to shape the direction of machine learning development while ensuring it was optimized for their needs.
The Indexing API Could Be a Test Ground
Google’s Indexing API has been around since 2018, notably 3 years before IndexNow existed. The initial purpose of it was to more quickly index Job Postings. Today, the API is officially intended for pages with JobPosting or BroadcastEvent schema markup.
However, there have been instances of people (mostly spammers and black hats) successfully getting it to work for other types of content. Many people that I’ve seen post about this on various forums had experienced drops thereafter, and were questioning whether it could be a penalty.
Moreover, the Indexing API make a lot of sense because 1. Using a limited API as a testbed aligns with Google’s history of developing their own solutions, and 2. This would also explain their prolonged “testing” of IndexNow.
A controlled environment would facilitate scalability testing, quality control, seamless integration with existing systems, user behavior analysis, and measurement of sustainability impact.
What Would Prompt Google to Ramp Up Testing Such a System?
Environmental Sustainability Goals (Not likely)
Currently, Google’s Environmental Risk Score is 1.6. That’s very low. Further investment in this area would be like spending money to make your site faster when it passes Core Web Vitals and has a “Good” Page Experience.
To Cut Costs (Not likely)
Given the lay-offs and consolidation efforts we’ve seen from Google recently I wouldn’t be surprised if this was the case, but there would likely be a larger motif behind such a significant change to a core system.
The Anti-trust Lawsuit (Plausible)
IndexNow effectively levels the playing field for smaller search engines. The court in the USA v. Google LLC antitrust case could potentially mandate either the adoption of IndexNow or the release of an equivalent alternative as a remedy. If put in this situation, Google would undoubtedly choose the later option.
Travel/Flight Search (Most likely)
Travel is one area of the internet that for a long time Google has not been the most dominant search engine. Given recent events, it has become very clear that Google has its sights set on capturing travel market share.
- In 2023, the overall transportation aggregators market was valued at $31.58 billion USD in 2023 and is expected to grow to $55.76 billion USD by 2030, with a CAGR of 8.46%
- The itinerary aggregators market, which includes flight aggregators, was estimated at $27 million in 2023 and projected to reach US$ 55.6 million by 2033, growing at a CAGR of 7.5%
In order to win, Google needs to be hands down the best at it. Between Expedia, Kayak, Booking.com, and other online travel brokers it has to be no competition. For this to happen, Google needs to create a completely seamless experience in one of the most dynamic verticals on the internet.
Travel sites, with frequently changing content and time sensitive information experience some of the heaviest crawl loads out there. An IndexNow-like “notification layer” would allow Google to provide the most accurate and up-to-date travel and flight information to users without the crawling burden.