Skip to main content
Schema & Structured Data

Why Your Website's HTML Structure Matters as Much as Your Schema Markup (And Why Content Hubs Are Your Best Shot at Getting Recommended by AI)

Dan CartwrightDan Cartwright
7 min read
Why Your Website's HTML Structure Matters as Much as Your Schema Markup (And Why Content Hubs Are Your Best Shot at Getting Recommended by AI)

Here's something that's been bugging me.

Everyone in the SEO world is banging on about JSON-LD schema markup like it's the only thing that matters for getting found by AI. "Add schema to your site," they say. "It's essential for AI visibility."

They're not wrong. But they're only telling half the story.

I've been digging into how AI systems actually read and recommend websites, and what I've found is this: the HTML structure of your page, the stuff that's been around since the 90s, is just as important as any fancy schema you bolt on top. And if you want ChatGPT, Perplexity, or Google's AI to actually recommend your business? You need both working together, wrapped around content that's organised into proper information hubs. (We talk about getting found by ChatGPT in a separate article)

Let me explain.

The Bit Everyone Forgets About Structured Data

When people say "structured data," they almost always mean JSON-LD schema markup. That script you stick in your page header that tells Google "this is a local business" or "this is a product with reviews."

But here's the thing. Semantic HTML is also structured data. Tags like <article>, <header>, <nav>, <main>, and <section> aren't just for making your code look tidy. They're explicit signals to search engines and AI systems about what your content actually is.

When a crawler hits your page and sees a <main> tag, it knows that's the primary content. When it sees <nav>, it knows those are navigation links. When it finds an <article> with a clear <header> containing an <h1>, it understands the topic hierarchy immediately.

This isn't speculation. A 2025 technical study found that pages with proper semantic HTML structure rank better because crawlers can index them more efficiently. They spend less time trying to figure out what's what, which matters when you're operating under crawl budget constraints.

So while you're obsessing over your JSON-LD, ask yourself: is your actual HTML a mess of divs and spans with no semantic meaning? Because if it is, you're making the AI work harder than it needs to.

Infographic comparing three structured data formats for SEO: JSON-LD (Linked Data) shown as code in a script tag with Organization schema example, Microdata (Woven in HTML) shown as HTML elements with Person schema attributes, and RDFa (Resource Description Framework in Attributes) shown as div elements with Product schema properties. Header reads 'Structured Data Unification' with footer stating 'Search Engine Understanding & Knowledge Graph Integration'. Gold and navy colour scheme with geometric design elements.

Three Formats, Not One

Google supports three structured data formats, not just JSON-LD:

JSON-LD sits in a separate script block, completely disconnected from your content. It's easy to maintain and Google recommends it. But there's a catch: when you update your page content, you have to remember to update the JSON-LD separately. Forget to do that, and you've got a mismatch between what your page says and what your schema claims.

Microdata embeds directly into your HTML tags. So instead of having your product name in one place and your schema in another, they're woven together. Update the visible text, and the structured data updates automatically.

RDFa works similarly to Microdata, embedding in your HTML attributes.

Which should you use? For most sites, JSON-LD is fine. But if you're running a site where content gets updated frequently by people who aren't developers, Microdata might actually be smarter. The built-in synchronisation means less risk of your schema drifting out of sync with reality.

And here's something most guides won't tell you: you can use both. A 2024 study found that pages using semantic HTML structure alongside schema markup performed 43% better than pages using only one approach. The two reinforce each other.

Diagram illustrating how AI systems process web content for citations. Multiple web pages and documents feed into a central AI processing hub, which evaluates content based on Authority, Structure, and Clarity signals. Output arrow leads to Citation/Recommended status shown with a checkmark. Gold and navy colour scheme with glowing design elements.
Authority, structure, clarity. Nail all three and AI systems have no reason not to cite you

Why This Matters for AI Recommendations

AI search is growing 165 times faster than traditional organic search. ChatGPT, Perplexity, Google AI Overviews, Bing Copilot. These systems don't just rank pages. They read them, extract information, and cite sources in their answers.

When you ask ChatGPT a question, here's roughly what happens:

  1. It interprets your query
  2. It retrieves relevant information from its training data and live web searches
  3. It evaluates content for authority and accuracy
  4. It synthesises answers from multiple sources
  5. It selects which sources to cite

That evaluation phase is where your site structure matters. Research shows these systems favour sources that are "authoritative, unambiguous, and easy to attribute." They prefer content with clear heading hierarchies, proper semantic structure, and comprehensive coverage.

In plain English: if your page is a wall of text in generic div tags, the AI has to work harder to understand it. If it's cleanly structured with proper HTML5 semantic elements, clear headings, and supporting schema, you've made its job easy. And AI systems, like humans, prefer easy.

Content hub architecture diagram showing a central pillar page connected to surrounding topic cluster nodes including Guide, Case Study, Data, Tutorial, Analysis, and Article. Nodes are interconnected with each other and the central pillar, illustrating internal linking strategy. Gold and navy colour scheme with glowing orbital design elements.
Content hub architecture: A central pillar page connects to supporting cluster content, with internal links flowing between all related pages to build topical authority.

The Content Hub Advantage

Here's where it gets interesting.

The content structure that performs best for AI citations isn't individual blog posts scattered randomly across your site. It's topic clusters organised around pillar pages. Information hubs.

Think of it like this. You've got a main page on "AI Visibility for Small Businesses" (your pillar). That page links to detailed supporting articles on specific subtopics: "What is AEO?", "How Schema Markup Works", "Getting Cited by ChatGPT", and so on. Each of those articles links back to the pillar and to each other where relevant.

This isn't just good for users. It's exactly what AI systems are looking for when they assess topical authority.

The numbers back this up. Sites using hub architecture see 40-65% more organic visibility for their target topics compared to sites with scattered content. Another study found that structured content hubs generate four times the traffic of standalone posts.

Why? Because when an AI sees a well-organised collection of content around a core topic, it signals that your site is a genuine resource on that subject. Not just someone who wrote one blog post about it once.

What AI Systems Actually Want

Research into how large language models interpret web content reveals some specific preferences:

Clear heading hierarchy. Pages with proper H1, H2, H3 nesting are easier for AI to parse than walls of text or div-heavy templates.

Short, focused paragraphs. LLMs prefer self-contained thoughts over dense blocks.

Navigation that mirrors your topic structure. Your menu should help both humans and AI understand how your content fits together.

Consistent terminology. Using the same terms for your services across your whole site helps AI systems with entity resolution. Inconsistency creates ambiguity.

Factual, specific content. AI systems are trained to avoid pages with excessive waffle, promotional fluff, or repeated calls-to-action that interrupt the flow. They want substance.

One study put it bluntly: "irrelevant or duplicated content fragments reduce retrieval efficiency." Every paragraph should contribute something. If it doesn't, cut it.

How This All Fits Together

Let me give you a practical example.

Say you're a family law solicitor in Bristol. You want to show up when someone asks ChatGPT "Who are the best family lawyers in Bristol?"

Here's what the AI is looking for:

At the HTML level: Your pages use semantic markup. Your practice areas are in <section> tags. Your team bios are in <article> tags. Your headings follow a logical hierarchy.

At the schema level: You've got LocalBusiness schema with your address, opening hours, and service areas. Your solicitors have Person schema with their qualifications. Your FAQ page uses FAQPage schema so the AI can grab questions and answers directly.

At the content level: You've built a hub around family law. A main page covering the topic broadly, linking to detailed pages on divorce proceedings, child custody, financial settlements, and so on. Each page is comprehensive (2,000+ words of actual substance), includes real examples from your experience, and links to relevant pages within your hub.

At the trust level: Clear author bios with qualifications. Proper contact information. Client testimonials. A secure site with HTTPS.

When all of these work together, you've made it as easy as possible for an AI to understand who you are, what you do, and why you're worth recommending.

The Practical Takeaway

You don't need to choose between semantic HTML and schema markup. You need both. And you need them wrapped around content that's organised properly.

Here's what I'd focus on:

Audit your HTML first. Before you touch your schema, check whether your pages actually use semantic elements. If everything's in divs, fix that.

Make sure your schema matches reality. If your JSON-LD says one thing and your visible content says another, you've got a problem. Consider whether Microdata might be more maintainable for frequently-updated content.

Build topic clusters, not random posts. Plan your content around pillar pages with supporting articles. Link them together deliberately.

Write for substance, not padding. AI systems are getting better at spotting fluff. Every section should earn its place.

Keep your terminology consistent. If you call it "AI visibility" on one page and "answer engine optimisation" on another, you're creating unnecessary confusion.

This isn't rocket science. It's just being systematic about how you structure your site and content. But most businesses don't bother, which means the ones who do have an advantage.

The AI systems are only going to get more important. Getting this right now puts you ahead of the curve.

Have questions about how your website is performing? - Visit www.scopesite.co.uk

Written By Daniel Cartwright
(Founder of Scopesite and V.O.I.C.E™)

Sources and Further Reading:

The research cited in this article comes from technical studies on semantic HTML performance, Google's structured data documentation, Schema.org specifications, and analyses of AI citation patterns from platforms including ChatGPT, Perplexity, and Google AI Overviews. Key sources include Search Engine Journal's technical SEO research, Schema App's structured data comparisons, and SimilarWeb's 2025 Generative AI Report on AI discovery trends.

Tags:Schema & Structured Data

Want to Work With Us?

If you found this helpful, imagine what we could do for your business.