Semantic Models for Data Product Managers: What You Need to Know

In this article, we’ll unpack what these models are, both in traditional and modern contexts, and why they are a critical concept for product managers at data companies to understand.

Christian Ayers

Introduction

Semantic models are having a Wayne’s World moment. Let me explain.

When Queen released Bohemian Rhapsody in 1975, it barely made a dent in the US. It was just another song, hardly anything to rhapsodize about. It wasn’t until Wayne’s World revived the song 17 years later that it became a cultural phenomenon - skyrocketing up the Billboard charts and cementing its status as one of the greatest rock songs ever made.

Semantic models are also not new, but interest in them is surging because they are the key ingredient in making AI-ready data products.

In this article, we’ll unpack what these models are, both in traditional and modern contexts, and why they matter for the smartest and most impressive people alive today: data product managers.

What is a Semantic Model?

Semantic models define the meaning and relationships of data so that machines can accurately interpret and reason about the complex information comprising a data product.

As an analogy, think of a semantic model as a data dictionary for machines. Data dictionaries tell analysts what the fields in their tables mean, and provide context for how the data should be used. The goal is to make the chaotic symphony of internal tables interpretable by that newly hired analyst, so that they don’t do something silly like forget to convert English to metric measurements and destroy a Martian probe.

Traditional vs Modern Semantic Models

Semantic models fall into two main categories: traditional and modern. Modern models build on the foundation of traditional ones rather than replacing them entirely. You can think of traditional models as the substrate, with modern models adding a key ingredient to make them LLM-friendly.

The traditional model is a symbolic framework, a set of human-defined rules and labels much like a business glossary or taxonomy. Common elements include:

Schema or Ontology: The blueprint that defines your data and how it’s organized.
- Format: JSON-LD, OWL/RDF/XML, or YAML
Glossary: Mini dictionary that explains each term in plain language
- Format: Usually CSV, Markdown, TXT, or JSON
Relationships: Connections between things, like “store is in a city”
- Format: RDF triples in a format like JSON-LD
Data Types and Constraints: Rules for what kind/type of data is allowed.
- Format: JSON, OWL, ShEx
Example Queries: Sample queries to show how the data works in real life.
- Format: JSON, Markdown
Mappings: Translations between the model and other systems’ terms
- Format: CSV, SKOS, JSON

Modern semantic models differ in that they add one key ingredient - semantic vectors (aka embeddings) - which captures the meaning of data in a way that LLMs specifically can understand. You’ve probably noticed that LLMs would rather hallucinate an incorrect response rather than say they’re not sure about something. Semantic vectors, also called embeddings, help with this problem. They improve the accuracy of LLM responses by pointing the LLM to the relevant data, as illustrated by the following examples.

Improved Semantic Retrieval: Embeddings help systems retrieve relevant data, even when the user’s language doesn’t match the data exactly.
- Example: A user asks “in what town do we sell the most hiking boots?”
- Underlying data: no keyword match for town (the tables use “city”) or hiking boots (the tables use “trail footwear”).
- Without embeddings: the relevant data would be missed.
- With embeddings: the LLM finds relevant data and provides a correct answer.
Enabling Multimodal Understanding: Embeddings translate non-text inputs into vectors that can be compared to a text-based query, allowing the system to retrieve relevant data across formats.
- Example: A user asks “which stores have the most foot traffic?”
- Underlying data: A heat map of foot traffic overlaid with store logos.
- Without embeddings: the image and text are disconnected, and the relevant data would be missed.
- With embeddings: the model links the visual heat map to the question and provides the right answer.

Why are Semantic Models Important in the AI Era?

It’s worth taking a step back and asking (or reminding you) why businesses are interested in LLM analytics in the first place. The goal is data democratization, a concept so hackneyed that it prompts eye rolls amongst data nerds, but the reality is that interest has never been stronger. The appeal of LLMs is their potential to remove bottlenecks between questions and answers. Rather than routing every query through an analyst or BI tool, LLMs promise a future where decision-makers can ask natural language questions - “What was last month’s net sales growth in Toledo?” - and get accurate answers instantly.

Every major data platform now offers an AI co-pilot to deliver on this promise. Snowflake has Cortex, Google has Vertex, Databricks has Gini and so on. These co-pilots work by using an LLM to convert a text input like “What was the net sales growth in Toledo last month” into a SQLstatement the query engine can understand.

Text-to-SQL Co-Pilot Platform Table

Platform	Text-to-SQL Co-Pilot
Snowflake	Cortex AI
Google	Vertex AI
Databricks	Genie
Amazon	Amazon Q
Microsoft	Co-pilot

‍

But this only works if the LLM understands what "Toledo" and “net sales growth” means in the context of the company’s data. Without a semantic model, it most likely will not.

Why Do Semantic Models Matter for Data Product Teams?

Imagine you’re a PM at a leading POI data provider, and you’re in charge of productizing the company’s flagship locations dataset. The dataset is highly unique, thorough and accurate - and you’ve made all the necessary investments to make the dataset consumable. You’ve written an exquisite data dictionary, built out notebooks, provided a shortlist of common queries… and the end product is wonderful in every way, truly an analyst’s dream.

Here’s the rub: your top prospects don’t want to read your Shakespearian data dictionary, open a notebook or get within 12 parsecs of SQL. They want to use an LLM. Not just any LLM either (certainly not your proprietary LLM), they want to use their LLM.

You begin to sweat…what if their LLM doesn’t understand my dataset? What if their LLM doesn’t know what Toledo is? How do I explain what Toledo is?!

This increasingly common scenario strikes at the heart of the LLM zeitgeist: how do you make a proprietary, structured dataset “semantically legible” to any LLM stack a customer might be using? Welcome to the semantic interoperability problem.

What is the Semantic Interoperability Problem?

Semantic interoperability is the ability for an AI agent managed by one team or company to successfully interpret data from another. Today, you might be able to build a semantic model that makes querying your data products seamless. But once the data leaves the four walls of your infrastructure, all that work goes to waste.

If the world aligned around one standardized semantic model then this problem wouldn’t be so hard, but each platform (e.g. Snowflake, BigQuery, Databricks, Redshift, etc.) has of course developed its own.

These competing models aren’t just technical variations; they encode different assumptions, metadata structures, and integration patterns. As a result, the same dataset might need to be semantically modeled in subtly different ways depending on the customer’s platform of choice.

Solving Semantic Interoperability

The easiest way to solve semantic interoperability (shameless plug incoming) is to work with Bobsled. We find semantic models extremely interesting, and there’s nothing we’d love more than to help you think through this problem (no strings attached, reach out here). Building semantic models is far from the only thing we do, but it’s one of the most interesting.

But let’s say you’re suffering from NIH syndrome and want to go it alone. Here’s how to actually tackle semantic interoperability: build not just a portable semantic layer, but platform-specific semantic adapters. Each platform has its own way of ingesting and interpreting semantic meaning. Your job is to align to their preferred models and specs.

Your to do’s would look like this:

Define formal schemas per platform (e.g. Cortex, BigQuery’s vector format, Databricks’ semantic lakehouse structure).
Provide prompt examples and glossary per environment, tuned to LLMs that operate within each platform’s ecosystem.
Precompute embeddings and metadata in the form expected by each system.
Bundle it all in a structured directory with clear documentation and connectors

By doing the above, you’re no longer just building a semantic layer, you’re creating semantic fluency across the stack. That’s the difference between shipping a dataset and shipping an AI-native data product.

By clicking download you're confirming that you agree with our Terms and Conditions.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Semantic Models for Data Product Managers: What You Need to Know

Christian Ayers

More resources

The 4 Data Agent Use Cases That Will Actually Pay Off in 2026

Say Goodbye to Egress Fees in Bobsled