Internal names, providers, and exact numbers have been abstracted or generalized for confidentiality — the architecture patterns and trade-offs described are accurate.
Context
In a B2B platform with many microservices, every service eventually needs to send notifications — operations alerts to Slack, customer-facing emails, webhook callbacks to third parties. Without a shared service, each microservice ships its own SMTP credentials, Slack tokens, retry logic, rate-limit handling, and recipient resolution. The cost compounds: every service inherits the operational risk of every channel, and channel-specific quirks get reinvented in different ways across the platform.
Notifier was extracted to centralize that. Producers publish a logical notification intent — "alert operator group X about event Y" — and Notifier handles credentials, formatting, retries, deduplication, and the channel-specific quirks.
Architecture
producer service ──► SNS (notification bus) ──► SQS (per-channel queues)
│
▼
Bun Lambda workers
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
Slack Email Webhook
(channel API) (SMTP/SES) (HTTP POST)
Surface
Producers don't pick a channel. They publish a logical intent — "notify Y of X" — and Notifier resolves channel preferences, formatting, and delivery. The same event can fan out to multiple channels based on operator preferences without the producer knowing about it.
Pub/sub topology
Notifications hit a single SNS topic. Channel-specific SQS queues subscribe with filter policies, so only relevant messages reach each channel's workers. Adding a new channel is a queue + worker, not a redeployment of every producer.
Custom Bun Lambda runtime
Workers run on AWS Lambda using a custom Bun runtime instead of the stock Node.js runtime. Bun starts faster on cold paths and runs TypeScript natively without a build step, which compounds across many small workers handling bursty traffic.
The runtime ships as a Lambda custom runtime layer wrapping the Bun binary, exposing the Lambda Runtime API. Workers stay as small TypeScript files; the runtime handles the bootstrap.
Per-channel resilience
Each channel queue has its own retry policy and DLQ, so a Slack outage doesn't back-pressure email delivery. Webhooks have signed payloads and per-target backoff so a flaky consumer doesn't block traffic to healthy ones.
Trade-offs
Custom Lambda runtime instead of stock Node.js. Custom runtimes carry ownership cost — security updates, runtime bugs, opaque error modes. The benefit is faster cold starts (notifications are bursty), zero build pipeline (TypeScript runs as-is), and a smaller per-function package. Across dozens of small workers, the wins add up.
SNS + per-channel SQS instead of one queue. A single queue would be simpler operationally. The benefit of fan-out is per-channel isolation — failure in one channel doesn't slow others, channels can be added/removed independently, and per-channel retry tuning is possible.
Logical intents instead of pre-rendered messages. Producers publishing "notify Y of X" rather than "send THIS Slack message to THAT channel" pushes formatting and recipient resolution into Notifier. The benefit is that channel-specific logic — mentions, threading, escalation policies, opt-outs — lives in one place instead of duplicated across every producer service.
Outcome
- Centralization. Channel credentials, rate limits, and retry behavior live in one service instead of every microservice.
- Channel isolation. A Slack outage stops Slack, not email. A flaky webhook target doesn't stall the platform.
- Speed of adding channels. A new channel becomes a queue + worker, with zero changes to producers.
- TypeScript without ceremony. The Bun runtime removes the build step, so the service stays close to plain TypeScript files — easier to read, faster to change.