Understanding Structured Data & the Importance of Semantics for SEO

SEOWeb DevelopmentSemantic SearchStructured Data

Author: Mike Ciffone

Published: 09.18.2021 / Updated: 09.20.2021

Structured data is very important to SEO. Over the past decade, search engines have been increasingly using structured data to interpret what is on our web pages.

Getting familiar with structured data will greatly benefit website owners as well as beginner and intermediate SEOs.

Experienced Webmasters and SEOs can benefit advanced methods of customizing structured data that can be used to strengthen the semantics we convey to search engines.


The Semantic Web & Semantic Search

The “Semantic Web” refers to the world of real objects on the internet, not just web documents. Real world objects are the sort of data you find in databases, such as Wikipedia.

It assigns URIs (Uniform Resource Identifiers) to many things – web pages, people, cars, and even abstract ideas.
“Semantic Web” is also W3C’s (World Wide Web Consortium) vision of the Web of linked data. The purpose of the Semantic Web is to preserve a shared understanding about what a URI identifies.

In 2008, Cool URIs for the Semantic Web cited the following two main requirements for how real-world objects or things should be identified on the web:

  1. A URI should provide information about what it identifies in a machine-readable (e.g. Resource Description Framework or “RDF”) and human-readable (e.g. Hyper Text Markup Language or “HTML”) format.
  2. There should be no confusion between identifiers for Web documents and identifiers for other resources. URIs are meant to identify only one of them, so one URI can’t stand for both a Web document and a real-world object.

Remember that second requirement because we’ll come back to it.

The easiest way to understand semantic search is to think of it as searching by meaning. It allows for more accurate information retrieval and also helps provide answers to questions that may not be easily found through typical keyword searches.

This means that websites can be found and ranked based on their relevance to specific phrases or subject matter, and not just based on keywords alone.

The use of the term semantic search is not very consistent, but this is largely because of how fast new techniques to understand meaning have evolved over the past decade.

Google’s Timeline: The semantic evolution happened fast

2012 – Google Knowledge Graph & “things not strings“,
2013 – Conversational search
2015 – RankBrain (machine learning based)
2019 – BERT and “neural matching”


Structured Data: A Method to Convey Meaning

Structured data provides a standard way to share information about content, so that a computer can understand it and present it in meaningful ways—for example, the author of a news article or a blog.

As you may know or already gathered, it enables search engines to more accurately serve results that match a user’s query.

Adding structured data to your website can improve your site’s relevance for certain types of searches and make it easier for search engine crawlers to index your site. It can also help search engines more quickly find new content after you’ve published it.

Schema.org

With the advancements in cognitive computing and machine learning that took place around the 2005-2010 period, there came an increasing need for a structured data format that computers could easily understand.

To meet this demand, in 2011 the Schema.org initiative was created by search engine companies (including Google) and large-scale web publishers who wanted to describe objects on pages so they could be better understood by machines.

Schema.org provides a framework that helps you describe what your content means and how it is related to other things on your website. At a minimum, schema will help search engines more accurately index your content.
Assuming your content is also good, implementing structured data may help some of your pages rank higher in organic results.

If structured data is like speech, schema.org is like a dialect of the language being spoken. Specifically, schema.org is a Schema Vocabulary. It is the primary vocabulary we use in structured data for SEO.

Structured Data Basics

Structured data is simply data that is machine readable. When we write or generate structured data, we are creating a machine-readable set of facts.

There is also exists semi-structured data and unstructured data – these have less formatting that is not machine readable.

Technically, we can write schema markup in several ways. You don’t have to just use one. Later on I’ll show you how we use both on one page.

  • Microdata
    • Via extension to HTML
  • RDFa
    • Via extension to HTML
  • JSON-LD (recommended)
    • Placed/injected in the <head> of an HTML document

JSON-LD (JavaScript Object Notation for Linking Data)

JSON-LD was only adopted as the standard amongst popular search engines, but it’s been gaining traction for almost a decade now. In 2015 Google began supporting the format, and as of 2017 officially recommend using it for structured data.

On the left is a bar graph showing the global usage statistics of structured data formats. Currently, only 38% of sites are using JSON-LD.

When someone that identifies as an SEO broadly mentions structured data, they probably mean JSON-LD written in the schema.org vocabulary.

Technically speaking, the XML in your Sitemap.xml file is also structured data. We just don’t commonly associate it with structured data in the same sort of way as we do with JSON-LD.

The point to understand here is that structured data is not just a thing SEOs use to generate SERP features. We’ll dive deeper into the distinction between writing structured data for enhancing search appearance vs writing it for Semantic SEO over the next few sections.


Rich Snippets/SERP Features

Industry chatter about structured data peaked back when Google first announced rich snippets/SERP features, but since then has decreased significantly despite its importance.

Over the years, featured snippets have become somewhat controversial within the SEO community. If you’re new to SEO, it’s important to understand why.

As you may know, there are typically 10 results on a SERP. Anywhere on a SERP that a result can appear we call SERP real estate. This goes for all results, so it can describe the position of both paid or organic results.

With the addition of ads, possibly a map pack, maybe a knowledge panel, and then things like answer boxes, the traditional organic results tend to get less attention because of featured snippets.

The rich snippets that SEOs care about the most highlight content shown on pages from the organic results. Commonly referred to as Answer Boxes – these results look cool, contain more content (text + media), and directly answer a user’s search.

The reason these are desired is because they show up before the normal organic results. We call this spot position zero.

If you click on the link below an answer box, it will take you to the page that Google extracted the information from.

This is typically the case, but there are times when the featured snippet text is absent from the article linked.
From a user experience perspective, answer boxes are pretty great. Google exists to provide people quality information both quickly and reliably. These snippets help us get the answers we want fast.

However, if you think about it solely from the perspective of an SEO, there are two primary caveats:

  1. Since the answer is shown directly to the user, the probability of a zero click search becomes highly likely. This means less traffic.
  2. Google ultimately decides the featured snippet, so they can’t be manipulated like rankings can. Also, the information presented is not exclusive to the linked result.
    1. Example: Article A and Article B have great infographics to accompany the featured snippet, but Article C has the most informative content and direct answer. Google presents a snippet with images from A and B, but awards the (unlikely) click to Article C.

SERP Feature Gallery

Google’s SERP Feature Gallery provides detailed documentation on the types of features it displays that use structured data. Some of them are automatically generated, such as the Knowledge Graph and answer boxes, but most of them can be coded for on your website.

Below is a screenshot showing the feature guides section of Googles search appearance documentation.

The first feature guide listed is for articles. You can see on the left all of the additional guides available.

Let’s check out what the JSON-LD code looks like for an article that is eligible to be shown with an enhanced appearance in the search results on Google.


    <html amp>
    <head>
        <title>Article headline</title>
        <script type="application/ld+json">
        {
        "@context": "https://schema.org",
        "@type": "NewsArticle",
        "mainEntityOfPage": {
            "@type": "WebPage",
            "@id": "https://google.com/article"
        },
        "headline": "Article headline",
        "image": [
            "https://example.com/photos/1x1/photo.jpg",
            "https://example.com/photos/4x3/photo.jpg",
            "https://example.com/photos/16x9/photo.jpg"
        ],
        "datePublished": "2015-02-05T08:00:00+08:00",
        "dateModified": "2015-02-05T09:20:00+08:00",
        "author": {
            "@type": "Person",
            "name": "John Doe",
            "url": "http://example.com/profile/johndoe123"
        },
        "publisher": {
            "@type": "Organization",
            "name": "Google",
            "logo": {
            "@type": "ImageObject",
            "url": "https://google.com/logo.jpg"
            }
        }
        }
        </script>
    </head>
    <body>
    </body>
    </html>

Keep in mind that for articles, in order to appear in the top stories carousel your pages must be AMP (Accelerated Mobile Pages). Non-AMP pages can still get some visual features, it’s just not as fancy as AMP pages.
That said, let’s break down the requirements


Semantic SEO

A little while ago when I said that structured data is not just something SEOs use to create SERP features.
So if not just for SERP features, why should we care about structured data?

What is Semantic SEO?

Semantic SEO is the process of building more meaning into your content. By optimizing for intent and not just answering a simple query, our goal is to create content that will answer multiple questions at once.

This allows you to provide value and depth on one page instead of having users continuously alter their queries and visit many different pages in search of answers.

A mantra that I’ve given to SEO interns over the years is that “Good SEO means you don’t have to open multiple results in new tabs.”

Quick Tips for Semantic Optimization

  • Keyword Mapping 
    • Find co-occurring terms, variants, think of other ways the thing is referred to
    • Include core keywords in: title tags, h1s, URL slugs, etc
    • Include co-occurring/variants in: body content, meta descriptions, alt text, anywhere else that makes sense
    • Research methods: Live search results, competitor analysis tools, Ahrefs Content Explorer, keyword research tools, custom scrapers, and more.
    • Other research methods: Surveys, Interviews
  • Phrase Mapping & Theme Modeling
    • Look at pages that rank on the first page for a given search, identify unique phrases that appear in close proximity multiple times
    • Be specific: “The former president”, “Former US President, Barack Obama”
    • Avoid Idioms and other figurative/non-literal words
    • Include in: H1-H6, body content, anchor text, alt text, captions, anywhere else that makes sense
    • Research methods: People also ask, auto complete, keyword research tools (Ubersuggest is free), Answer the Public, Wikipedia, DBpedia, and more
  • Remember the 5 W’s – Who, What, Where, When, Why
    • Who/what: Name of the thing (Ex MozCon)
    • What: Primary function, usage, purpose, etc (SEO Conference)
    • Where: Where it’s relevant (Seattle)
    • Who/when: When and to whom is it relevant (marketing professionals, in the summer)

Writing Custom Structured Data

Schema.org is very expansive. There are all sorts of ways we can customize our structured data beyond the scope of Google’s SERP feature gallery. As we’ve discussed so far, the more semantics we can create, the better. Being diligent about marking up as much of your content as possible is a great way to get a leg up over your competitors.

Representing Multiple Types of Things on a Page

We have two options when we want to mark up multiple items (formally objects) on the same page. We can “nest” them or we can list them separately.

Using Separate Objects

When we need to mark up items on our pages that are literally things such as breadcrumbs, images, videos, etc – separating items into individual blocks can make the most sense.

Nesting

To distinguish properties from entities, JSON-LD prefers nested objects. For example, a list of labels may be assigned under the same property. This gives us a lot of flexibility when defining data relationships such as “parent” or “child”.

If you want to provide information about an entity such as an organization, person, nesting is typically a good choice.

Creating a Custom Organization Schema

Here’s organization script that references the entity “Example Software Company” while including a SoftwareApplication schema.

In this example, we’re informing search engines about a tech company that makes a software application called “SalesFarce CRM”. On average, the app has a rating of 2.6/5 stars.

What is interesting is that we’re also informing search engines that a company called “B2B Software Sales” is a seller of the product, despite it being free.

These semantics might be incredibly important, because now if the seller publishes content about the company, search engines can better understand their relationship.


// Identifies an entity as a 3rd party seller of software
{
    "@context": "https://schema.org",
    "@type": "Organization",
    "name": "Example Software Company",
    "url": "https://example.com/",
    "logo": "https://picsum.photos/200",
    "hasOfferCatalog": {
        "@type": "OfferCatalog",
        "name": "Software as a service",
        "alternateName": "SaaS",
        "itemListElement": [
            {
                "@type": "Offer",
                "itemOffered": {
                    "@type": "SoftwareApplication",
                    "name": "SalesFarce CRM",
                    "operatingSystem": "All",
                    "applicationCategory": "WebApplication",
                    "aggregateRating": {
                      "@type": "AggregateRating",
                      "ratingValue": "2.6",
                      "ratingCount": "8864"
                    },
                    "offers": {
                      "@type": "Offer",
                      "price": "1.00",
                      "priceCurrency": "USD"
                    }
                }
            }
        ]
    },
    "seller": {
        "@type": "Organization",
        "name": "B2B Software Sales Company",
        "url": "https://www.example-software-global.net/",
        "logo": "https://picsum.photos/200L"
    },
    "sameAs": [
        "https://twitter.com/example-software-company",
        "https://linkedin.com/example-software-company",
        "https://facebook.com/example-software-company"
    ]
}

We can see the Rich Result Test checks out just fine.

Testing Variations

The script below contains a custom organization schema that I’ve been testing on agency’s website. It’s designed to tell search engines about our location, the areas we serve, as well as our core services. It passes both Google’s Rich Results test and the Schema Markup Validator (Beta)


{
    "@context": "https://schema.org",
    "@type": "Organization",
    "name": "Ciffone Digital",
    "legalName": "Ciffone Digital, LLC",
    "slogan": "Disrupt the Status Quo",
    "founder": "Mike Ciffone",
    "url": "https://ciffonedigital.com",
    "logo": "https://ciffonedigital.com/wp-content/uploads/2021/03/Ciffone-Digital-Logo-Primary-CD.png",
    "email": "hello@ciffonedigital.com",
    "telephone": "(312) 508-3012",
    "areaServed": ["US","GB","CA"],
    "availableLanguage": [
        {
            "@type": "Language",
            "name": "English"
        }
    ],
    "location": {
        "@type": "Place",
        "address": {
        "@type": "PostalAddress",
            "addressLocality": "Chicago",
            "addressRegion": "IL",
            "postalCode": "60610"
        }
    },
    "hasOfferCatalog": {
    "@type": "OfferCatalog",
    "name": "Digital Marketing services",
    "itemListElement": [
        {
        "@type": "Offer",
            "itemOffered": {
            "@type": "Service",
            "name": "Search Engine Optimization",
            "alternateName": "SEO"
        }
        },
        {
        "@type": "Offer",
            "itemOffered": {
            "@type": "Service",
            "name": "Content Marketing"
            }
        },
        {
            "@type": "Offer",
                "itemOffered": {
                "@type": "Service",
                "name": "Website Development"
                }
            },
        {
        "@type": "Offer",
            "itemOffered": {
                "@type": "Service",
                "name": "Pay-Per-Click",
                "alternateName": "PPC"
        }
          }
        ]
      },
    "sameAs": [
        "https://ciffonedigital.com",
        "https://github.com/Ciffone-Digital",
        "https://www.linkedin.com/company/ciffone-digital/",
        "https://twitter.com/ciffone_digital",
        "https://www.facebook.com/ciffonedigital"
    ]
}

Reasons to Use Both JSON-LD and Microdata

Search engines can have trouble distinguishing the difference in how JSON-LD structured data is written as opposed to textual information on a webpage. The confusion occurs because objects in the document <head> could be very far from the semantic content on the page. This can make verifying the information on Web pages more difficult. Especially for less seen schema types/properties/classes.

One pro of using microdata is that since the code is integrated in the HTML, it appears very close to the actual information it is referencing. This can potentially make search engines more confident in the relationship thanks to values being directly associated with texts on a web page.

Alternatively, Schema.org provides WebPageElement, which accepts the following properties cssSelector or xpath.

Property Expected Type Description
Properties from WebPageElement
cssSelector CssSelectorType A CSS selector, e.g. of a SpeakableSpecification or WebPageElement. In the latter case, multiple matches within a page can constitute a single conceptual “Web page element”.
xpath XPathType An XPath, e.g. of a SpeakableSpecification or WebPageElement. In the latter case, multiple matches within a page can constitute a single conceptual “Web page element”.

There are also a handful of more specific types for WebPageElement.

Formatting & Syntax Resources (WC3 Documentation)


This post is an ongoing project that I will continue updating over time. Hopefully this was helpful. For any questions feel free to reach out to me via email (just submit our form), Twitter, Linkedin.

I’ve been answering questions over on ProWebmasters for about a little over month now. Tag your question with “Semantic SEO” or “JSON-LD” – I have filters set for those so I’ll see it come through.