Object storage

Send events to Amazon S3

Deliver events as NDJSON files into a customer-owned S3 bucket, batched, partitioned, and idempotent.

What is Amazon S3?

Amazon S3 is AWS's object storage service. As a destination, it is the simplest way to hand a customer a durable, queryable copy of every event your product emits about them, without operating a streaming pipeline. Pushrail writes batched NDJSON files into a bucket and key prefix the customer controls; their data engineering team treats it like any other S3-backed data lake.

Why deliver events to Amazon S3

  • Customer owns the data, it lands in their account, billed to them, governed by their IAM policies.
  • No streaming infrastructure needed, analysts query the bucket with Athena, Snowflake external tables, Spark, or anything else that reads S3.
  • Cheap durability, S3 is the cheapest place to keep years of event history without provisioning a database.
  • Compatible with every cloud data tool, almost every analytics, ML, or compliance tool reads S3 natively.

How Pushrail delivers events to Amazon S3

The S3 adapter accumulates events into NDJSON batches per (destination, hour, event type) and flushes when either the batch reaches the configured size threshold or the time window closes. Files are written under a deterministic key like `s3://<bucket>/<prefix>/event_type=order.completed/dt=2026-05-26/hour=14/part-<sha>.ndjson`. Each part file's name embeds a content hash so re-runs from replay produce identical keys, overwrites are safe and idempotent.

Auth and credentials

Customers grant Pushrail write access to a specific bucket and key prefix using either an IAM role with `sts:AssumeRole` (recommended, no long-lived credentials) or an access key pair scoped via IAM policy. The role ARN or access key is stored encrypted at rest; rotations are first-class, the customer updates the destination, in-flight batches drain on the old creds, new batches use the new creds. Pushrail's egress IP is published so customers can lock the bucket policy to that source.

Batching, retries, and replay

Default batch size is 1 MB or 1,000 events, default flush window is 60 seconds, whichever comes first. Transient S3 errors (5xx, throttling) are retried with exponential backoff; permanent errors (AccessDenied, NoSuchBucket) fail fast to the DLQ. Replay re-runs the same batch logic and writes to the same deterministic key, so re-running a window does not duplicate rows for downstream consumers.

Example payload

Pushrail accepts the canonical event shape on POST /v1/events. Below is the ingestion request your service makes.

{
  "eventType": "order.completed",
  "occurredAt": "2026-05-26T14:21:08.493Z",
  "source": "billing-service",
  "customerExternalId": "acct_8K2zRq",
  "idempotencyKey": "order_38a91f-completed",
  "correlationId": "req_4f30b2",
  "payload": {
    "orderId": "ord_38a91f",
    "amount": 12900,
    "currency": "USD",
    "items": [
      { "sku": "PR-PRO-MONTHLY", "qty": 1, "price": 12900 }
    ]
  },
  "metadata": {
    "tier": "pro",
    "region": "us-east-1"
  }
}

Example configuration

The fields your customer fills in to point Pushrail at their Amazon S3 setup.

{
  "type": "S3",
  "name": "Customer S3 archive",
  "bucket": "acme-pushrail-events",
  "keyPrefix": "raw/pushrail/",
  "region": "us-east-1",
  "auth": {
    "mode": "ASSUME_ROLE",
    "roleArn": "arn:aws:iam::123456789012:role/PushrailWriter",
    "externalId": "acme-ext-7Pq2"
  },
  "batchSizeMb": 1,
  "flushIntervalSec": 60
}

Common use cases

  • Give enterprise customers a clean, queryable archive of every event your product emits about them.
  • Feed a customer's data lake or lakehouse without operating a streaming pipeline on their behalf.
  • Satisfy procurement asks for 'we need raw event data delivered to our own storage'.
  • Support compliance use cases where the customer must retain raw events in their own jurisdiction.

Frequently asked questions

Do I need to build an S3 export pipeline myself?

No. Pushrail's S3 adapter batches events into partitioned NDJSON files and writes them into the customer's bucket for you. You send one canonical event to Pushrail's ingest API and the adapter handles batching, partitioning, and idempotent keys.

Whose AWS account and credentials are used?

The customer's. Events land in their bucket, billed to their account and governed by their IAM. They grant write access with an IAM role via sts:AssumeRole (recommended) or a scoped access key pair, stored encrypted at rest. Pushrail's egress IP is published so they can lock the bucket policy to that source.

How are the files organized?

Events are written as NDJSON batches partitioned by event type and hour under a deterministic key like event_type=order.completed/dt=2026-05-26/hour=14/part-<sha>.ndjson. Default batches flush at 1 MB or 1,000 events or after a 60-second window, whichever comes first.

What happens if a write fails, and are replays safe?

Transient errors like 5xx or throttling retry with exponential backoff; permanent errors like AccessDenied or NoSuchBucket fail fast to the dead-letter queue. Every attempt is recorded in the delivery logs, and replay re-runs the same batch logic and writes to the same deterministic key, so re-running a window does not duplicate rows for downstream consumers.

Send events to Amazon S3

Sandbox is open. No credit card.