Back to blog
Article

Delivering Events to Data Warehouses: S3, BigQuery, Snowflake, and ClickHouse

By Aylon··5 min read

Your enterprise customers don't just want webhooks, they want events loaded directly into their data infrastructure. Here's how that works.

Enterprise customers increasingly expect their SaaS vendors to deliver data directly into their data infrastructure. Webhooks are table stakes. What closes deals is the ability to say: "Yes, we can load events directly into your BigQuery dataset" or "We push to your S3 bucket in real time."

This is the trajectory of B2B data delivery. And it is surprisingly different from webhook delivery.

Why warehouses matter for enterprise sales

When an enterprise prospect asks "can you send data to our Snowflake?", they are really asking: "Can we incorporate your product data into our analytics and reporting without building a custom pipeline?"

The answer to this question can make or break a deal. Enterprise data teams have standardized on warehouse-centric architectures. They want all operational data flowing into a central warehouse where their BI tools, ML pipelines, and reporting systems can access it. If your product data requires a custom ETL job to reach their warehouse, that is friction, and friction kills deals.

How warehouse delivery differs from webhooks

Webhook delivery is request-response: you POST a JSON payload and get back a status code. Warehouse delivery is fundamentally different in several ways.

Batching is required. You cannot insert a single row into BigQuery or Snowflake with every event. The APIs are designed for bulk operations. You need to accumulate events, batch them on a time or size window, and write them as a group. This introduces buffering, flush logic, and a new category of failure modes.

Schema matters. Warehouses are schema-on-write (or at least schema-aware). Your event payloads need to map to table columns. When your event schema evolves, the warehouse table may need to evolve too. This requires schema detection, migration handling, and backward compatibility logic.

Authentication is heavier. S3 needs IAM roles or access keys with specific bucket permissions. BigQuery needs service account JSON with dataset-level grants. Snowflake needs account identifiers, warehouse names, database and schema paths, and either key-pair or password authentication. Each credential type needs secure storage, validation, and rotation support.

File formats matter. S3 delivery is really file delivery. You need to decide on format (JSON, NDJSON, Parquet, CSV), partitioning strategy (by date, by event type, by hour), file naming conventions, and compression. These choices affect how easily the customer's downstream tools can consume the data.

What a good implementation looks like

A solid warehouse delivery system handles these concerns:

For S3 and GCS:

  • Batch events by configurable time window or size threshold
  • Write as NDJSON (one JSON object per line, the most universally compatible format)
  • Partition files by date and event type
  • Support customer-specified bucket, prefix, and region
  • Validate credentials on destination setup with a test write

For BigQuery:

  • Batch events and use the insertAll streaming API or load jobs depending on volume
  • Auto-detect schema from event payloads
  • Handle schema evolution gracefully (add new columns, never drop existing ones)
  • Support customer-specified project, dataset, and table
  • Validate credentials with a test insert

For Snowflake:

  • Stage data via internal or external stage
  • Use COPY INTO for bulk loading
  • Handle Snowflake-specific authentication (key-pair recommended)
  • Support customer-specified warehouse, database, schema, and table

For ClickHouse:

  • Batch via the HTTP interface with JSONEachRow format
  • Respect async insert thresholds to avoid small-part fragmentation
  • Support both ClickHouse Cloud and self-hosted endpoints
  • Handle backpressure when the cluster is busy

The operational complexity

Beyond the initial integration, warehouse destinations introduce operational concerns that webhook destinations do not have:

Buffering failures. If your batching layer crashes before flushing, you lose events. You need durable buffering with exactly-once flush semantics.

Partial batch failures. A batch of 1,000 events where 3 fail schema validation should not reject the entire batch. You need per-record error handling and dead-letter routing for individual failed records.

Cost awareness. Warehouse operations have direct cost implications for your customers. Streaming inserts in BigQuery cost more than batch loads. Snowflake compute costs vary by warehouse size. Your system should be efficient enough that customers do not get surprised by their cloud bills.

Why this is hard to build well

Every destination type is essentially a miniature integration project. Supporting S3, BigQuery, Snowflake, and ClickHouse well means building and maintaining four separate delivery engines, each with their own authentication, batching, schema, and error handling logic. The cost adds up fast.

This is exactly the kind of focused infrastructure work that makes sense as a platform rather than a bespoke project. Pushrail supports 18 destination types across webhooks, object storage, warehouses, message queues, streams, databases, and analytics, with unified routing, retry, replay, and observability across all of them.

When your enterprise prospect asks "can you deliver to our warehouse?", the answer should be "yes, which one?", not "let me check with engineering."

Ready to stop building delivery infrastructure?

Start free. Send your first event in under 5 minutes.

Protected by reCAPTCHA, Google's Privacy Policy and Terms apply.