Google has never stopped evolving. From its early days as a straightforward link-following algorithm to today’s sophisticated, multi-product crawling infrastructure, Google has undergone a series of dramatic transformations over the past two decades.
Algorithm updates like Panda, Penguin, Hummingbird, BERT, and the Helpful Content Update have each reshaped how websites are discovered, evaluated, and ranked. For digital marketing professionals, keeping pace with these changes isn’t optional — it’s survival.
“As someone who has been running digital marketing teams for years, I’ve always made it a priority to stay current — not just with what Google does, but with how it actually works under the hood. One of the areas I consistently point my teams toward is something most marketers gloss over: Google’s crawling infrastructure. It sounds technical, and it is, but ignoring it has real consequences for your search visibility.”
What is crawling and fetching?
In the context of a search engine like Google, the terms “crawling” and “fetching” refer to two distinct but connected stages of getting a webpage into a search index. Think of it like a library: crawling is the librarian wandering the aisles to find new books, and fetching is the act of actually grabbing a book off the shelf to read it.
- The Crawler (The Discovery Phase)
The crawler (like Googlebot) is the automated “spider” that navigates the web. Its main job is discovery.
- Pathfinding: It starts with a list of known URLs and sitemaps. As it visits those pages, it looks for links ($<a>$ tags) to new pages it hasn’t seen before.
- The Queue: Every time it finds a new link, it adds that URL to a massive “Crawl Queue.”
- Priority: It doesn’t crawl everything at once. It uses algorithms to decide which pages are most important to visit first, based on things like how often the site updates and its “authority.”
- The Fetcher (The Retrieval Phase)
Once a URL is at the top of the queue, the Fetcher takes over. This is the “request” part of the process.
For years, “Googlebot” has been treated as a single, monolithic crawler — but that picture is outdated. Google’s Gary Illyes recently pulled back the curtain on how the search giant’s crawling infrastructure actually operates, and the reality is far more nuanced.
Googlebot Is Not One Bot
The name “Googlebot” is a legacy label from the early 2000s, when Google had just one product and one crawler. Today, dozens of Google products — Search, Shopping, AdSense, and more — all route their requests through a centralized crawling platform under different crawler names. When you see “Googlebot” in your server logs, you’re only seeing Google Search. Many other crawlers are quietly doing their own work alongside it.
The 2MB Limit: What It Means for Your Pages
Every crawler operating within Google’s infrastructure sets a byte limit per URL — and this is where site owners need to pay close attention. Googlebot fetches a maximum of 2MB per URL (including HTTP headers), with a separate 64MB limit for PDFs. For any crawler that doesn’t specify a limit, the default is 15MB regardless of content type.
What happens when a page exceeds that threshold? Googlebot doesn’t reject it — it simply stops downloading at the 2MB cutoff and passes whatever it retrieved to Google’s indexing systems and Web Rendering Service (WRS), treating that partial file as if it were complete. Any content beyond the cutoff is never fetched, never rendered, and never indexed. It simply doesn’t exist as far as Google is concerned.
How Google Renders What It Fetches
Once the bytes are retrieved, the WRS takes over — processing JavaScript and CSS, executing client-side code, and handling XHR requests much like a modern browser would. Importantly, the WRS operates statelessly, clearing local storage and session data between requests. The same 2MB limit applies to every external resource it pulls in during rendering.
What This Means for Site Owners
The practical takeaways are straightforward. Keep your HTML lean by moving heavy CSS and JavaScript to external files, which each get their own separate byte budget. Put your most critical elements — title tags, canonicals, structured data — near the top of the document so they’re safely within the 2MB window. And monitor your server response times: slow servers cause Google’s crawlers to back off automatically, reducing how often your pages get crawled.
Understanding these byte-level mechanics isn’t just a technical curiosity — it directly determines what Google sees, renders, and ultimately ranks.
Why Richard Uzelac’s Digital Marketing Team Can’t Afford to Ignore This
For digital marketing professionals, the crawler vs. fetcher distinction isn’t a backend concern — it sits at the heart of nearly every campaign, audit, and content strategy decision.
Search visibility depends on crawlability. When you publish a new landing page, a blog post, or a product update, it doesn’t automatically appear in search results. A search engine crawler has to find it first, crawl it, and index it. If your site has crawl blockers — a misconfigured robots.txt, a noindex tag placed by mistake, or pages buried too deep in the site architecture — your content simply won’t rank, no matter how good your copy or backlinks are.
Marketers who understand crawling know to check crawl coverage in Google Search Console and treat crawl errors as urgent, not cosmetic.



