The Sledhouse Data Product Lakehouse White Paper

Introduction

Sledhouse is Bobsled’s cutting-edge data product lakehouse designed to transform how data providers create, fulfill, and govern data products. Built on a multi-cloud, zero-copy architecture, Sledhouse offers fine-grained product customization and fulfillment at a fraction of the cost and complexity of traditional data feeds.

Sledhouse replaces complex and inefficient home-built data feeds with a unified data fulfillment system. Leading data companies like Dun and Bradstreet, CoreLogic, ZoomInfo, and Deutsche Boerse rely on Bobsled to deliver the right data to the right customers where they want it—all while dramatically reducing the infrastructure and labor costs traditionally required to maintain data feeds.

In this white paper, we will cover Sledhouse’s:

Key capabilities
Architecture
Data flow
Security

Key Capabilities

Bobsled offers a powerful suite of features that free up engineering teams and empower product, sales and operations to deliver better experiences to customers:

Granular entitlements

Empower operations and sales teams to configure the columns and rows to share with customers. Pre-configure common slices or customize based on customer needs. All without creating a ticket.

Global fulfillment

Securely deliver analytics-ready data products to customers in the platforms where they work from a single platform. Support for FTP, cloud storage and native cloud data warehouse sharing. Complete coverage without pipelines, credentials or accounts to manage.

Unified governance

Easily track usage and entitlements across every customer. Get alerts when deliveries fail, explore user behavior with telemetry, and easily audit lineage even for customized products.

Zero-copy architecture

Customize and fulfill data products directly to customers without replicating data. Sledhouse’s zero-copy fulfillment engine leverages the growing adoption of open table formats to enable highly performant, egress-free fulfillment of data products directly to a customer’s cloud platform (e.g., Snowflake, Databricks).

Egress-free sharing

Sledhouse is built on an egress-free, S3-compliant data lake so you only pay egress once for every data product.

Architecture

Sledhouse is architected to radically reduce the cost of managing and supporting end-to-end data fulfillment workflow while maintaining the flexibility and security modern data businesses require. By replacing disparate data stores and pipelines with a single, globally interoperable data product lakehouse, data providers can accelerate customers time-to-insight while reducing the incremental cost of fulfillment.

Sledhouse High-Level Architecture

Ingestion Engine and 
Table Maintenance Service

Sledhouse uses our Spark-based Ingestion and Table Maintenance Engines to seamlessly and cost-efficiently maintain Iceberg-based versions of select data assets.

Ingestion

Replicates data to Iceberg in R2 using multiple approaches depending on the use case, supporting both incremental updates and full overwrites

Table Maintenance

Manages background maintenance jobs to compact, update, and delete data, ensuring optimal performance and data integrity

Interoperable Data Lake

Sledhouse’s Interoperable Data Lake enables teams to store data products in a format that enables cross-platform fulfillment with minimal infrastructure costs.

Cloudflare R2

All data in Sledhouse is stored in Cloudflare’s egress-free storage, R2. This dramatically reduces one of the major cost drivers for both zero-copy and distributed multi-cloud fulfillment.

Benefits

Elimination of data transfer egress

Global distribution network, ensuring low-latency access for users worldwide across data platforms

S3-compatible API, facilitating seamless integration with existing tools and workflows

Apache Iceberg


We employ Apache Iceberg as our open table format, providing interoperability across data platforms. This enables performance when querying data across different platforms.

Benefits

ACID compliant lakehouse table format

Optimized performance for large-scale data processing tasks

Snapshot, time travel and rollback features, enhancing data versioning and recovery options

Data Product Catalog

The Data Product Catalog is where users create and manage data products. Users can create new data products by generating logical views of core tables stored in the Data Lake. Fulfillment teams can further customize on a per-customer basis.

Benefits

Zero-copy data product creation and management

UI and API-based customization

Detailed lineage for all products fulfilled to customers

Global Fulfillment 
Engine

The Global Fulfillment engine powers fulfillment within Sledhouse to every supported destination (cloud data warehouses, cloud storage, FTP). The fulfillment engine offers zero-copy fulfillment (select locations) and localized fulfillment via Bobsled’s first-of-its-kind distribution network. (More detailed data flows for Zero-Copy Fulfillment can be found in the Data Flow section.)

Data Access

Sledhouse implements a comprehensive access management system:

Integration

Data access to Iceberg tables via external tables and views in data warehouses

Access

Fine-grained access control managed through data product views

Zero-copy fulfillment

Sledhouse is the first commercial platform to offer cross-platform zero-copy fulfillment. With Zero-copy Fulfillment, Bobsled uses external tables to generate logical views of Iceberg tables for customers natively within their cloud data warehouses. (Please see the Data Flows section for more details.)

Benefits

No incremental storage or egress fees

No replica datasets to maintain

Performant queries due to native support of Iceberg by major cloud data warehouses

Data distribution network

Sledhouse also offers the ability to localize data within the platform or region of a customer. This improves query performance for data warehouse destinations. In most cases, data is replicated within single-tenant staging accounts managed by Bobsled.

Benefits

Improved query performance

Egress-free replication

Managed Infrastructure/Sharing Protocols

Sledhouse permissions data products to customers using a network of fully-managed, single-tenant cloud accounts. These accounts enable Sledhouse to permission products using the native sharing protocols of each platform. Customers receive data as a “data share” accessible with no ETL.

Benefits

Secure, single-tenant infrastructure that is fully transferrable

Credential-free permissioning for cloud destinations

No ETL for customers

Control Plane

To streamline operations and enhance user experience, Sledhouse offers:

Management

Intuitive UI and comprehensive REST API for high level declarative configuration of Sledhouse tables and data products

Automation

Streamlined workflows for data product creation and distribution

This robust technical foundation enables Sledhouse to revolutionize data sharing and product management by enabling on-demand data product customization, global fulfillment and unified governance across every customer with limited technical support.

Data Flow

Below, we offer a step-by-step description of the data flow when sharing data via our zero copy data flow. Data is encrypted both in motion and at rest from throughout the entire workflow.

Data Flow Diagram: Zero Copy Fulfillment

Data is incrementally replicated from source platform (e.g. Google BigQuery, Snowflake) using multiple replication patterns to Iceberg Tables in R2

Iceberg tables are created and managed by a Spark-based Table Maintenance service

External Tables are setup in destination platforms on top of the Iceberg tables

Data Product views are created on top of external tables and added to shares/listings

Data consumers are granted access to data shares or listings

Consumer queries in destination platforms (e.g. Snowflake) are served from Iceberg tables in R2

Security

Bobsled is trusted by the world largest data companies to power fulfillment of their most sensitive data.

SOC 2 & ISO 20071 Certified

SOC 2 and ISO 24001 are just the start. As an infrastructure company, Bobsled has security in its DNA.

GDPR & CCPA Compliant

Bobsled is compliant with modern major standardizing offering solution for DSAR and consent requirements.

Secure, single-tenant architecture


All data in Bobsled is processed through single tenant environments, isolated and owned by each customer.

Encryption in motion and at rest

Data in Bobsled is encrypted in motion and at rest throughout the entire platform.

Conclusion

The emergence of data feeds as one of the three pillars of data monetization—alongside SaaS apps and APIs—means that data companies now need the security, scalability and reliability offered by commercially-built infrastructure. Sledhouse dramatically simplifies the fulfillment of process for analytical data by providing a single platform to streamline the creation, fulfillment and governance of data products no matter where they are being consumed.

Engineering teams like Dun & Bradstreet, CoreLogic, Deutsche Boerse and ZoomInfo rely on Bobsled because it helps improve the customer experience while reducing costs.

Speed up sales cycles

Get prospects and customers querying your data and discovering insight in minutes no matter where they work.

Eliminates engineering burden

Empower non-technical teams to get customers up-and-running on the data they want without engineering support.

Drive down cloud costs

Radically reduce the storage, egress and compute costs required to customize and fulfill data in the cloud.