Understanding Structured Data and its Role in Semantic SEO

SEOSemantic SearchStructured Data

Author: Mike Ciffone

Published: 09.18.2021 / Updated: 02.13.2024

Structured data is very important to SEO. Over the past decade, search engines have been increasingly using structured data to interpret what is on our web pages.

Getting familiar with structured data will greatly benefit website owners as well as beginner and intermediate SEOs.

Experienced Webmasters and SEOs can benefit advanced methods of customizing structured data that can be used to strengthen the semantics we convey to search engines.


The Semantic Web & Semantic Search

The “Semantic Web” refers to the world of real objects on the internet, not just web documents. Real world objects are the sort of data you find in databases, such as Wikipedia.

It assigns URIs (Uniform Resource Identifiers) to many things – web pages, people, cars, and even abstract ideas.
“Semantic Web” is also W3C’s (World Wide Web Consortium) vision of the Web of linked data. The purpose of the Semantic Web is to preserve a shared understanding about what a URI identifies.

In 2008, Cool URIs for the Semantic Web cited the following two main requirements for how real-world objects or things should be identified on the web:

  1. A URI should provide information about what it identifies in a machine-readable (e.g. Resource Description Framework or “RDF”) and human-readable (e.g. Hyper Text Markup Language or “HTML”) format.
  2. There should be no confusion between identifiers for Web documents and identifiers for other resources. URIs are meant to identify only one of them, so one URI can’t stand for both a Web document and a real-world object.

Remember that second requirement because we’ll come back to it.

The easiest way to understand semantic search is to think of it as searching by meaning. It allows for more accurate information retrieval and also helps provide answers to questions that may not be easily found through typical keyword searches.

This means that websites can be found and ranked based on their relevance to specific phrases or subject matter, and not just based on keywords alone.

The use of the term semantic search is not very consistent, but this is largely because of how fast new techniques to understand meaning have evolved over the past decade.

Google’s Timeline: The semantic evolution happened fast

2012 – Google Knowledge Graph & things, not strings
2013 – Conversational search
2015 – RankBrain – an artificial intelligence (AI) system used by Google to understand search intent
2019 – BERT (Bidirectional Encoder Representations from Transformers) is a revolutionary method for natural language processing (NLP) pre-training developed by Google.


Structured Data: A Method to Convey Meaning

Broadly, structured data refers to any data that is organized in a predefined format or schema, making it easily searchable and identifiable by algorithms and database systems. It provides a standard way to share information about content, so that a computer can understand it and present it in meaningful ways (eg., the author of a news article or a blog).

On the web, structured data can be used to annotate (or “mark up”) content, making it easier for search engines to understand the context of the information on web pages.

This is achieved through the use of formats such as JSON-LD, Microdata, and RDFa, which allow SEOs and Marketers to specify information about the page content, such as products, recipes, reviews, and events, in a way that can be directly processed by search engines to improve search results and enable rich snippets in search listings.

As you may know or already gathered, Structured Data enables search engines to more accurately serve results that match a user’s query. Adding it to your website can improve its relevance for certain queries.

In summary, structured data is instrumental in realizing the vision of the Semantic Web by providing a means to explicitly define, link, and share data in a semantically rich and machine-understandable format.

This not only facilitates better data integration and collaboration across the web but also paves the way for more intelligent and adaptive web services and applications.

Schema.org

With the advancements in cognitive computing and machine learning that took place around the 2005-2010 period, there came an increasing need for a structured data format that computers could easily understand.

To meet this demand, in 2011 the Schema.org initiative was created by search engine companies (including Google) and large-scale web publishers who wanted to describe objects on pages so they could be better understood by machines.

Schema.org provides a framework that helps you describe what your content means and how it is related to other things on your website. At a minimum, schema will help search engines more accurately index your content.
Assuming your content is also good, implementing structured data may help some of your pages rank higher in organic results.

If structured data is like speech, schema.org is like a dialect of the language being spoken. Specifically, schema.org is a Schema Vocabulary. It is the primary vocabulary we use in structured data for SEO.

Structured Data Basics

Structured data is simply data that is machine readable. When we write or generate structured data, we are creating a machine-readable set of facts.

There is also exists semi-structured data and unstructured data – these have less formatting that is not machine readable.

Technically, we can write schema markup in several ways. You don’t have to just use one. Later on I’ll show you how we use both on one page.

  • Microdata
    • Via extension to HTML
  • RDFa
    • Via extension to HTML
  • JSON-LD (recommended)
    • Placed/injected in the <head> of an HTML document

JSON-LD (JavaScript Object Notation for Linking Data)

JSON-LD was only recently (2015ish) adopted as the standard amongst popular search engines, but it’s been gaining traction for almost a decade now. In 2015 Google began supporting the format, and as of 2017 officially recommend using it for structured data.

On the left is a bar graph showing the global usage statistics of structured data formats. Currently, only 38% of sites are using JSON-LD.

When someone that identifies as an SEO broadly mentions structured data, they probably mean JSON-LD written in the schema.org vocabulary.

Technically speaking, the XML in your Sitemap.xml file is also structured data. We just don’t commonly associate it with structured data in the same sort of way as we do with JSON-LD.

The point to understand here is that structured data is not just a thing SEOs use to generate SERP features. We’ll dive deeper into the distinction between writing structured data for enhancing search appearance vs writing it for Semantic SEO over the next few sections.


Rich Snippets/SERP Features

Industry chatter about structured data peaked back when Google first announced rich snippets/SERP features, but since then has decreased significantly despite its importance.

Over the years, featured snippets have become somewhat controversial within the SEO community. If you’re new to SEO, it’s important to understand why.

As you may know, there are typically 10 results on a SERP. Anywhere on a SERP that a result can appear we call SERP real estate. This goes for all results, so it can describe the position of both paid or organic results.

With the addition of ads, possibly a map pack, maybe a knowledge panel, and then things like answer boxes, the traditional organic results tend to get less attention because of featured snippets.

The rich snippets that SEOs care about the most highlight content shown on pages from the organic results. Commonly referred to as Answer Boxes – these results look cool, contain more content (text + media), and directly answer a user’s search.

The reason these are desired is because they show up before the normal organic results. We call this spot position zero.

If you click on the link below an answer box, it will take you to the page that Google extracted the information from.

This is typically the case, but there are times when the featured snippet text is absent from the article linked.
From a user experience perspective, answer boxes are pretty great. Google exists to provide people quality information both quickly and reliably. These snippets help us get the answers we want fast.

However, if you think about it solely from the perspective of an SEO, there are two primary caveats:

  1. Since the answer is shown directly to the user, the probability of a zero click search becomes highly likely. This means less traffic.
  2. Google ultimately decides the featured snippet, so they can’t be manipulated like rankings can. Also, the information presented is not exclusive to the linked result.
    1. Example: Article A and Article B have great infographics to accompany the featured snippet, but Article C has the most informative content and direct answer. Google presents a snippet with images from A and B, but awards the (unlikely) click to Article C.

SERP Feature Gallery

Google’s SERP Feature Gallery provides detailed documentation on the types of features it displays that use structured data. Some of them are automatically generated, such as the Knowledge Graph and answer boxes, but most of them can be coded for on your website.

Below is a screenshot showing the feature guides section of Googles search appearance documentation.

The first feature guide listed is for articles. You can see on the left all of the additional guides available.

Let’s check out what the JSON-LD code looks like for an article that is eligible to be shown with an enhanced appearance in the search results on Google.


    <html amp>
    <head>
        <title>Article headline</title>
        <script type="application/ld+json">
        {
        "@context": "https://schema.org",
        "@type": "NewsArticle",
        "mainEntityOfPage": {
            "@type": "WebPage",
            "@id": "https://google.com/article"
        },
        "headline": "Article headline",
        "image": [
            "https://example.com/photos/1x1/photo.jpg",
            "https://example.com/photos/4x3/photo.jpg",
            "https://example.com/photos/16x9/photo.jpg"
        ],
        "datePublished": "2015-02-05T08:00:00+08:00",
        "dateModified": "2015-02-05T09:20:00+08:00",
        "author": {
            "@type": "Person",
            "name": "John Doe",
            "url": "http://example.com/profile/johndoe123"
        },
        "publisher": {
            "@type": "Organization",
            "name": "Google",
            "logo": {
            "@type": "ImageObject",
            "url": "https://google.com/logo.jpg"
            }
        }
        }
        </script>
    </head>
    <body>
    </body>
    </html>

Keep in mind that for articles, in order to appear in the top stories carousel your pages must be AMP (Accelerated Mobile Pages). Non-AMP pages can still get some visual features, it’s just not as fancy as AMP pages.

The only cosmetic benefit (for now) of a non-amp article schema is a thumbnail in the result. However, however a non-amp article schema is where you might start testing objects to optimize for entity associations.

To explain what I mean here’s an info-graphic from a LinkedIn post a while back.


Semantic SEO

A little while ago when I said that structured data is not just something SEOs use to create SERP features.
So if not just for SERP features, why should we care about structured data?

What is Semantic SEO?

Semantic SEO is the process of building more meaning into your content. By optimizing for intent and not just answering a simple query, our goal is to create content that will answer multiple questions at once.

This allows you to provide value and depth on one page instead of having users continuously alter their queries and visit many different pages in search of answers.

A mantra that I’ve given to SEO interns over the years is that “Good SEO means you don’t have to open multiple results in new tabs.”

Quick Tips for Semantic Optimization

  • Keyword Mapping 
    • Find co-occurring terms, variants, think of other ways the thing is referred to
    • Include core keywords in: title tags, h1s, URL slugs, etc
    • Include co-occurring/variants in: body content, meta descriptions, alt text, anywhere else that makes sense
    • Research methods: Live search results, competitor analysis tools, Ahrefs Content Explorer, keyword research tools, custom scrapers, and more.
    • Other research methods: Surveys, Interviews
  • Phrase Mapping & Theme Modeling
    • Look at pages that rank on the first page for a given search, identify unique phrases that appear in close proximity multiple times
    • Be specific: “The former president”, “Former US President, Barack Obama”
    • Avoid Idioms and other figurative/non-literal words
    • Include in: H1-H6, body content, anchor text, alt text, captions, anywhere else that makes sense
    • Research methods: People also ask, auto complete, keyword research tools (Ubersuggest is free), Answer the Public, Wikipedia, DBpedia, and more
  • Remember the 5 W’s – Who, What, Where, When, Why
    • Who/what: Name of the thing (Ex MozCon)
    • What: Primary function, usage, purpose, etc (SEO Conference)
    • Where: Where it’s relevant (Seattle)
    • Who/when: When and to whom is it relevant (marketing professionals, in the summer)

Writing Custom Structured Data

Schema.org is very expansive. There are all sorts of ways we can customize our structured data beyond the scope of Google’s SERP feature gallery. As we’ve discussed so far, the more semantics we can create, the better. Being diligent about marking up as much of your content as possible is a great way to get a leg up over your competitors.

To enhance the clarity and utility of your section on “Representing Multiple Types of Things on a Page” with the goal of teaching the power of nesting JSON-LD, when to use separate objects, and representing multiple types of Schema.org types on a page, I’ve expanded each part with more helpful and high-quality information.


Representing Multiple Entities and Types on a Page

When crafting structured data for a webpage, you might encounter situations where multiple items or entities need to be described simultaneously. The decision on whether to “nest” these items within each other or to list them as separate entities depends on their relationship and the context in which they are presented. Understanding when and how to use each approach can significantly enhance the effectiveness of your structured data.

Using Separate Objects

Definition: Listing separate objects involves creating distinct blocks of JSON-LD for each item or entity on your page that stands alone or does not have a direct hierarchical relationship with other items.

When to Use:

  • Distinct Entities: Use separate objects for items that function independently on the page, such as breadcrumbs, images, videos, FAQs, and reviews.
  • Enhanced Clarity: Separating items into individual blocks can improve the readability of your structured data, making it easier for search engines to understand and index each element accurately.
  • Modular Updates: When items might be updated independently of each other, maintaining them as separate objects simplifies content management and updates.

Example: A webpage with an article, a video tutorial, and user reviews would benefit from separate JSON-LD blocks for each component to clearly delineate their distinct nature and content.

The Power of Nesting

Nesting involves embedding objects within other objects to create a hierarchical structure. This approach is used to indicate relationships between entities, such as parent-child or container-contained relationships.

When to Use:

  • Related Entities: Nesting is ideal for representing entities that have a direct relationship with one another. For example, an article entity might contain nested author and comment entities, indicating the author of the article and comments related to it.
  • Complex Data Structures: Use nesting to convey complex data relationships, such as products within a specific category, services offered by an organization, or episodes within a TV series.
  • Efficient Data Grouping: Nesting allows for the efficient grouping of related information, reducing redundancy and improving the coherence of the structured data.

Example: An event page could use nesting to include the event’s location (Place) entity within the event (Event) entity, seamlessly linking the venue’s details with the event information.

Best Practices

  • Consistency: Maintain consistency in how you use nesting and separate objects across your site to aid in the predictable organization and interpretation of your structured data.
  • Testing: Utilize tools like Google’s Structured Data Testing Tool to validate your JSON-LD markup, ensuring that it accurately reflects the intended structure and relationships.
  • Documentation: Document your structured data approach, especially the logic behind nesting and separating objects, to facilitate maintenance and future development.

Creating a Custom Organization Schema

Here’s organization script that references the entity “Example Software Company” while including a SoftwareApplication schema.

In this example, we’re informing search engines about a tech company that makes a software application called “SalesFarce CRM”. On average, the app has a rating of 2.6/5 stars.

What is interesting is that we’re also informing search engines that a company called “B2B Software Sales” is a seller of the product, despite it being free.

These semantics might be incredibly important, because now if the seller publishes content about the company, search engines can better understand their relationship.


// Identifies an entity as a 3rd party seller of software
{
    "@context": "https://schema.org",
    "@type": "Organization",
    "name": "Example Software Company",
    "url": "https://example.com/",
    "logo": "https://picsum.photos/200",
    "hasOfferCatalog": {
        "@type": "OfferCatalog",
        "name": "Software as a service",
        "alternateName": "SaaS",
        "itemListElement": [
            {
                "@type": "Offer",
                "itemOffered": {
                    "@type": "SoftwareApplication",
                    "name": "SalesFarce CRM",
                    "operatingSystem": "All",
                    "applicationCategory": "WebApplication",
                    "aggregateRating": {
                      "@type": "AggregateRating",
                      "ratingValue": "2.6",
                      "ratingCount": "8864"
                    },
                    "offers": {
                      "@type": "Offer",
                      "price": "1.00",
                      "priceCurrency": "USD"
                    }
                }
            }
        ]
    },
    "seller": {
        "@type": "Organization",
        "name": "B2B Software Sales Company",
        "url": "https://www.example-software-global.net/",
        "logo": "https://picsum.photos/200L"
    },
    "sameAs": [
        "https://twitter.com/example-software-company",
        "https://linkedin.com/example-software-company",
        "https://facebook.com/example-software-company"
    ]
}

We can see the Rich Result Test checks out just fine.

Testing Variations: Custom Organization Schema

The schema below contains a custom organization schema that I’ve been testing on agency’s website. It’s designed to tell search engines about our location, the areas we serve, as well as our core services. It passes both Google’s Rich Results test and the Schema Markup Validator (Beta)


{
    "@context": "https://schema.org",
    "@type": "Organization",
    "name": "Ciffone Digital",
    "legalName": "Ciffone Digital, LLC",
    "slogan": "Disrupt the Status Quo",
    "founder": "Mike Ciffone",
    "url": "https://ciffonedigital.com",
    "logo": "https://ciffonedigital.com/wp-content/uploads/2021/03/Ciffone-Digital-Logo-Primary-CD.png",
    "email": "[email protected]",
    "telephone": "(312) 508-3012",
    "areaServed": ["US","GB","CA"],
    "availableLanguage": [
        {
            "@type": "Language",
            "name": "English"
        }
    ],
    "location": {
        "@type": "Place",
        "address": {
        "@type": "PostalAddress",
            "addressLocality": "Chicago",
            "addressRegion": "IL",
            "postalCode": "60610"
        }
    },
    "hasOfferCatalog": {
    "@type": "OfferCatalog",
    "name": "Digital Marketing services",
    "itemListElement": [
        {
        "@type": "Offer",
            "itemOffered": {
            "@type": "Service",
            "name": "Search Engine Optimization",
            "alternateName": "SEO"
        }
        },
        {
        "@type": "Offer",
            "itemOffered": {
            "@type": "Service",
            "name": "Content Marketing"
            }
        },
        {
            "@type": "Offer",
                "itemOffered": {
                "@type": "Service",
                "name": "Website Development"
                }
            },
        {
        "@type": "Offer",
            "itemOffered": {
                "@type": "Service",
                "name": "Pay-Per-Click",
                "alternateName": "PPC"
        }
          }
        ]
      },
    "sameAs": [
        "https://ciffonedigital.com",
        "https://github.com/Ciffone-Digital",
        "https://www.linkedin.com/company/ciffone-digital/",
        "https://twitter.com/ciffone_digital",
        "https://www.facebook.com/ciffonedigital"
    ]
}

Grouped Offers with ItemList for Price Drops and Seasonal Promotions

Here’s an eCommerce example. This custom schema.org markup employs an ItemList to group multiple offers for a single product or service on our website. This approach is designed to semantically enrich the presentation of special offers, such as price drops and seasonal promotions, providing our visitors with clear, structured information that highlights the value and temporal nature of each promotion.

Key features:

  • Structured Collection of Offers: By grouping offers under an ItemList, we create a semantically structured collection that implies a relationship between the offers. This indicates to our visitors that these are not just random promotions but are related offers, providing options like different pricing tiers or conditions for the same product.
  • Semantic Highlighting of Promotions: Each offer within the ItemList is named descriptively (e.g., “Spring Sale” for discounted offers and “Regular Price” for standard pricing), directly communicating the nature and value of the promotion. This clarity helps visitors understand the special conditions, such as limited-time availability or seasonal relevance.

{
	"@context": "https://schema.org/",
	"@type": "Product",
	"name": "Men's Raincoat",
	"url": "https://example.com/products/raincoat",
	"image": [
		"https://example.com/photos/1x1/photo.jpg",
		"https://example.com/photos/4x3/photo.jpg",
		"https://example.com/photos/16x9/photo.jpg"
		],
	"description": "Waterproof Mens raincoat perfect for spring",
	"sku": "0446310786",
	"mpn": "925872",
	"brand": {
		"@type": "Brand",
		"name": "Example Nature Outdoors",
		"id": "wikipedia.com/wiki/example-nature-outdoors",
		"url": "example-nature-outdoors.com"
	},
	"review": {
		"@type": "Review",
		"reviewRating": {
			"@type": "Rating",
			"ratingValue": 4,
			"bestRating": 5
		},
		"author": {
			"@type": "Person",
			"knowsAbout": "Hiking and the outdoors",
			"name": "Jane Doe",
			"sameAs": "https://instagram.com/jane-doe-outdoors"
		},
		"aggregateRating": {
			"@type": "AggregateRating",
			"ratingValue": 4.4,
			"reviewCount": 189
		},
		"positiveNotes": {
			"@type": "ItemList",
			"itemListElement": [
				{ "@type": "ListItem", "position": 1, "name": "Very waterproof" },
				{ "@type": "ListItem", "position": 2, "name": "Extremely durable" }
			]
		},
		"negativeNotes": {
			"@type": "ItemList",
			"itemListElement": [
				{ "@type": "ListItem", "position": 1, "name": "Limited colors available" }
			]
		}
	},
	"offers":{
		"@type": "itemList",
		"itemListElement": [
			{
				"@type": "offer",
				"name": "Spring Sale",
					"priceSpecification": {
						"@type": "PriceSpecification",
						"price": 80.99,
						"priceCurrency": "USD"
					},
				"priceValidUntil": "2024-6-20",
				"itemCondition": "https://schema.org/NewCondition",
				"availability": "https://schema.org/InStock"
			},
			{
				"@type": "offer",
				"name": "Regular Price",
					"priceSpecification": {
						"@type": "PriceSpecification",
						"price": 160.99,
						"priceCurrency": "USD"
					},
				"itemCondition": "https://schema.org/NewCondition",
				"availability": "https://schema.org/InStock"
			}
		]
	},
	"hasMerchantReturnPolicy": {
		"@type": "MerchantReturnPolicy",
		"applicableCountry": "US",
		"returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow",
		"merchantReturnDays": 30,
		"returnMethod": "https://schema.org/ReturnByMail",
		"returnFees": "https://schema.org/FreeReturn"
	}
}

Reasons to Use Both JSON-LD and Microdata

Search engines can have trouble distinguishing the difference in how JSON-LD structured data is written as opposed to textual information on a webpage. The confusion occurs because objects in the document <head> could be very far from the semantic content on the page. This can make verifying the information on Web pages more difficult. Especially for less seen schema types/properties/classes.

One pro of using microdata is that since the code is integrated in the HTML, it appears very close to the actual information it is referencing. This can potentially make search engines more confident in the relationship thanks to values being directly associated with texts on a web page.

Alternatively, Schema.org provides WebPageElement, which accepts the following properties cssSelector or xpath.

Properties from WebPageElement
cssSelector CssSelectorType A CSS selector, e.g. of a SpeakableSpecification or WebPageElement. In the latter case, multiple matches within a page can constitute a single conceptual “Web page element”.
xpath XPathType An XPath, e.g. of a SpeakableSpecification or WebPageElement. In the latter case, multiple matches within a page can constitute a single conceptual “Web page element”.

There are also a handful of more specific types for WebPageElement.

Formatting & Syntax Resources (WC3 Documentation)


This post is an ongoing project that I will continue updating over time. Hopefully this was helpful. For any questions feel free to reach out to me via email (just submit our form), Twitter, Linkedin.

I’ve been answering questions over on ProWebmasters for about a little over month now. Tag your question with “Semantic SEO” or “JSON-LD” – I have filters set for those so I’ll see it come through.