Schema markup is losing weight across search surfaces

Google deprecated FAQ rich results on 7 May 2026, the eighth schema-driven format pulled from Search this year. Four days later, Ahrefs published a controlled study of 1,885 pages showing JSON-LD didn't lift AI citations. Publisher-declared structured data is becoming a secondary signal across search surfaces.

Google deprecated FAQ rich results on 7 May 2026, completing a three-year wind-down and bringing the 2026 count to eight schema-driven rich result categories pulled from Search. Google cited low quality and widespread misuse.

Four days later, Ahrefs published a controlled study of 1,885 pages that added JSON-LD schema markup over eight months. AI Overview citations didn't move. The gap against a 4,000-page control set was statistically significant at roughly 1-in-2,500 by chance.

Publisher-declared structured data is becoming a secondary signal across search surfaces, on traditional Search and in AI alike.

What Google has actually been removing

The January 2026 batch took rich results away from seven schema types: Course Info, Claim Review, Estimated Salary, Learning Video, Special Announcement, Vehicle Listing, and Practice Problems. The May 2026 batch added FAQ. All eight categories shared the same shape. Publishers could add the markup freely, and the rich result was the prize.

Google's reason for the FAQ deprecation was that publishers had been adding FAQ schema to pages without genuine question-and-answer content, claiming SERP real estate the content didn't earn. The same gaming pattern shaped the January batch. Where the rich result generated traffic and the markup was easy to fabricate, the gaming caught up with the format.

What's expanding alongside the deprecations is the opposite category. MemberProgram structured data, launched in late 2025, lets merchants describe loyalty programmes in a way Google's product surfaces can render. Merchant listings have gained new sub-attributes for loyalty pricing and shipping benefits. Both extend into AI Mode and Gemini results.

Where Google has an independent source of truth to verify against, the markup amplifies that verified data. Where the only signal is the publisher's own assertion, the markup is being deprioritised or removed.

What AI search is doing with schema

The Ahrefs study tested whether adding JSON-LD to a page lifts citations in AI Overviews. Across 1,885 pages tracked against 4,000 controls, it doesn't.

A separate controlled experiment, published February 2026, traced the mechanism. LLMs tokenise JSON-LD blocks as raw text. The semantic structure publishers think they're providing is destroyed during ingestion. The same experiment confirmed that during direct retrieval, when an AI system fetches a single URL to populate an answer, every major system tested extracted only visible HTML content. JSON-LD, hidden Microdata, and hidden RDFa were all ignored. Both ChatGPT and Perplexity produced answers from invalid, fabricated schema that was logically incoherent, because they weren't reading it as schema. They were reading it as more text.

LLMs can extract product names, prices, ratings, dates, authors, and other facts from well-written prose with high enough reliability that the schema layer becomes redundant. The cheap inference path beats the expensive structured-extraction path when the model is already strong enough.

Google withdraws schema-driven rich results when gaming makes them noisy. AI systems skip the schema layer altogether, because reading the prose costs less than verifying the markup.

What AI agents actually do when they fetch a page

Fetch reliability is what AI agents respond to. Two categories of agent are visiting pages in 2026, with different mechanics.

Retrieval crawlers (GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, Google-Extended) operate over HTTP without executing JavaScript. They read first-response HTML. They generate the bulk of AI crawler traffic, more than 50 billion requests per day by January 2026, with GPTBot the most active at around 4,200 hits per site per day. About 89% of this traffic is training or mixed-purpose. Only 2.2% responds to live user queries.

Browser-mode agents are different. ChatGPT agent, Atlas, Perplexity Comet, Anthropic Computer Use, and the headless-browser stacks underneath them render layout and execute JavaScript. They behave like human users. They see what humans see, and they fail in human-like ways, at CAPTCHAs, at Cloudflare challenges, at session redirects, and at consent banners that block the content until they're dismissed.

Both categories share the same failure modes when the fetch goes wrong. A 403 or 429 response gets retried with exponential delay, then dropped from the queue. The citation never happens. A 5xx error gets retried, then dropped. A 3xx redirect gets followed up to a hop limit. A soft 404, where the server returns 200 but the body is empty or generic, is the dangerous case. The crawler thinks it has valid content. Whatever's in the empty shell ends up in the index.

For retrieval crawlers specifically, JavaScript-only sites are functionally invisible. If the first-response HTML is one div and a script tag, the crawler reads nothing of substance. The site isn't eligible for the training corpus or the retrieval index AI systems pull from when answering live queries. This is the same problem the previous post on AI-built sites described, viewed from the bot's side rather than the site's.

For browser-mode agents, the dominant failure mode is bot defence. Cloudflare's one-click AI bot blocking, available since September 2024, is active on more than a million customer sites. GPTBot is the most-blocked AI crawler in robots.txt directives globally. The share of Cloudflare-fronted sites GPTBot can access dropped from 35.46% to 28.97% in one year. Some of this is intentional, some is collateral. A site that blocks AI crawlers also loses AI citation eligibility.

The trust dynamic is asymmetric. Perplexity was caught by Cloudflare in August 2025 running stealth crawlers that rotated user-agents and IP addresses to evade no-crawl directives, including across multiple ASNs. Some other AI vendors' user-agents (Google-Agent, Google-NotebookLM) explicitly disregard robots.txt as “user proxies”, on the rationale that the bot is acting on a specific user's behalf rather than as a crawler. Compliance is per-vendor, not per-category.

Where the signals are converging

Structured data that depends only on publisher assertion is being deprioritised. Structured data that aligns with verifiable inventory, or that's confirmed through another channel, is being amplified.

For traditional Search, that means schema types like Product (validates against Merchant Center) and Local Business (validates against Business Profile) keep earning rich results. Schema types where the publisher could claim anything are losing visibility.

For AI search, that means visible HTML content is the input. Prose that clearly states facts is read by the LLM. Schema in the page head is ignored on direct retrieval. The fact that the page renders at all, on a request without a JavaScript runtime, is the gate.

What this means for where you put the work

For sites with existing schema markup, the practical position is simpler than the trend suggests. Keep accurate markup where it amplifies a signal that's already verifiable. Stop adding schema in the hope that it'll earn AI citations on its own. Audit whether AI retrieval crawlers can fetch and read your pages. Audit whether browser-mode agents survive the bot defences on your stack.

The implementation work is context-specific. The retrieval crawlers that matter depend on the audience. Live failure modes vary by stack. Migration sequencing has to respect what's currently ranking on traditional Search.

Publisher-declared structured data is becoming a secondary signal. The retrieval layer, which decides whether a bot can fetch and read the page at all, is the primary one.

Sources

Google Search Central. FAQ structured data documentation with deprecation timeline. developers.google.com/search/docs/appearance/structured-data/faqpage

Google Search Central. MemberProgram (loyalty program) structured data. developers.google.com/search/docs/appearance/structured-data/loyalty-program

Ahrefs (11 May 2026). We tracked 1,885 pages adding schema. AI citations barely moved. ahrefs.com/blog/schema-ai-citations

Search Engine Journal (May 2026). Google Drops FAQ Rich Results From Search. searchenginejournal.com

Digital Applied. Agentic crawler behaviour: 30-day site log study. digitalapplied.com

Cloudflare press release (July 2025). Permission-based approach to AI crawlers. cloudflare.com

MIT Technology Review (July 2025). Cloudflare will block AI bots by default. technologyreview.com