Appearance
Architecture — Webhook Reliability Gateway
About 656 wordsAbout 2 min
Overview
GetHook is an API-first webhook reliability layer. It accepts inbound webhooks, persists them durably, and forwards them to configured destinations. It also allows outbound event publishing for SaaS platforms that send webhooks to their customers.
Design Principles
- Durability first — events are persisted before 202 Accepted is returned.
- Crash safety — the worker uses
FOR UPDATE SKIP LOCKEDso crashes don't lose work. - Idempotency — duplicate
external_event_idvalues are detected and ignored. - Simple over clever — Postgres job queue for MVP (no Kafka, no Redis).
- Tenant isolation — all data is scoped by
account_id.
Components
API Server (cmd/api)
Standard net/http server. Handles:
- Ingest endpoint (
POST /ingest/{token}) — public, token-authenticated - Management API (
/v1/*) — API-key authenticated
Middleware stack (outer → inner):
- Request size limit (5 MB)
- Structured logger
- Panic recovery
- Auth middleware (on protected routes)
Delivery Worker (cmd/worker)
Polls events table for due work:
SELECT ... FROM events
WHERE status IN ('queued','retry_scheduled')
AND (next_attempt_at IS NULL OR next_attempt_at <= NOW())
FOR UPDATE SKIP LOCKED
LIMIT 10Claims events by setting status = 'delivering', then dispatches concurrently.
Scheduler (cmd/scheduler)
A lightweight process that runs two independent goroutines, each on its own ticker:
RetentionCleaner (default: every 1 hour)
- Deletes terminal events (
delivered,dead_letter,replayed) older than each account'sretention_dayswindow - Uses a single
DELETE … USING accountsjoin — no per-account loops - Skips in-flight events (
queued,delivering,retry_scheduled) - Runs one immediate pass on startup, then on every tick
ExportProcessor (default: every 30 seconds)
- Claims queued
export_jobsrows usingFOR UPDATE SKIP LOCKEDinside a transaction - Runs
events.ListForExportwith user-supplied filters (direction, status, date range), capped at 10 000 rows - Encodes result as JSON or CSV, stores in
export_jobs.result_data - API server serves the result via
GET /v1/exports/{id}/download
Env vars: SCHEDULER_RETENTION_INTERVAL (default 1h), SCHEDULER_EXPORT_POLL_INTERVAL (default 30s).
Forwarder (internal/delivery)
For each event:
- Fetch destination config
- Build HTTP POST with signed headers
- Apply destination auth (bearer token, custom header)
- Record
DeliveryAttemptregardless of outcome - Return success/error to worker
Retry Logic
Exponential backoff schedule (hardcoded for MVP, configurable per route later):
- Attempt 1: immediate
- Attempt 2: +30s
- Attempt 3: +2m
- Attempt 4: +10m
- Attempt 5: +1h
- → Dead Letter Queue
Security
- API keys: random 32-byte key prefixed
hk_, stored as SHA256 hash - Signing secrets: AES-256-GCM encrypted at rest
- HMAC signing:
t=<unix>,v1=<HMAC-SHA256(secret, t.body)>(Stripe-compatible) - Header filtering: sensitive headers (Authorization, Cookie) stripped before storage
Data Model
Account (tenant)
├── retention_days (per-account, drives scheduler cleanup)
├── APIKey (many)
├── Source (inbound endpoint)
│ └── Route → Destination
├── Destination (forwarding target)
├── Event (inbound | outbound)
│ └── DeliveryAttempt (one per try)
├── ExportJob (async export request)
├── BrandSettings (white-label)
└── CustomDomain (white-label)Event Lifecycle
received → accepted → queued → delivering → delivered
│
├── retry_scheduled → (back to queued)
└── dead_letter (max retries exceeded)
Any terminal state can be replayed → creates a new queued eventDatabase
PostgreSQL. Migrations tracked in schema_migrations table, applied in filename order.
Key indexes:
events(status, next_attempt_at)— worker pollingevents(account_id)— tenant-scoped queriesapi_keys(hashed_key)— auth lookupsources(path_token)— ingest routing
Queue Mechanism
PostgreSQL-based job queue using FOR UPDATE SKIP LOCKED. This provides:
- At-least-once delivery (safe to crash)
- Concurrent workers without double-processing
- No external dependencies (Redis/SQS) for MVP
Upgrade path: replace the polling with a proper queue (PGMQ, Redis streams, SQS) without changing the worker interface.
White-Labeling
Primitives in place for MVP:
BrandSettings— company name, logo, colors, docs titleCustomDomain— domain + TLS status + verification token- Sources can be associated with a custom domain for branded ingest URLs
- HMAC signing headers use generic
Webhook-Signaturekey (customizable later)
Scaling Considerations (Post-MVP)
- Multiple worker replicas: safe due to
SKIP LOCKED - Large payloads: store body in S3-compatible storage, keep reference in
events.body - High fanout: process routes in parallel within the worker
- Rate limiting: add per-account limits using a token bucket (Redis-backed)
- Observability: structured JSON logs today; add OpenTelemetry spans per event/attempt