Vinícius Bispo

Internal names, providers, and exact numbers have been abstracted or generalized for confidentiality — the architecture patterns and trade-offs described are accurate.

Context

In a B2B platform with many microservices, every service eventually needs to send notifications — operations alerts to Slack, customer-facing emails, webhook callbacks to third parties. Without a shared service, each microservice ships its own SMTP credentials, Slack tokens, retry logic, rate-limit handling, and recipient resolution. The cost compounds: every service inherits the operational risk of every channel, and channel-specific quirks get reinvented in different ways across the platform.

Notifier was extracted to centralize that. Producers publish a logical notification intent — "alert operator group X about event Y" — and Notifier handles credentials, formatting, retries, deduplication, and the channel-specific quirks.

Architecture

producer service ──► SNS (notification bus) ──► SQS (per-channel queues)
                                                       │
                                                       ▼
                                               Bun Lambda workers
                                                       │
                              ┌────────────────────────┼────────────────────────┐
                              ▼                        ▼                        ▼
                            Slack                    Email                   Webhook
                          (channel API)            (SMTP/SES)             (HTTP POST)

Surface

Producers don't pick a channel. They publish a logical intent — "notify Y of X" — and Notifier resolves channel preferences, formatting, and delivery. The same event can fan out to multiple channels based on operator preferences without the producer knowing about it.

Pub/sub topology

Notifications hit a single SNS topic. Channel-specific SQS queues subscribe with filter policies, so only relevant messages reach each channel's workers. Adding a new channel is a queue + worker, not a redeployment of every producer.

Custom Bun Lambda runtime

Workers run on AWS Lambda using a custom Bun runtime instead of the stock Node.js runtime. Bun starts faster on cold paths and runs TypeScript natively without a build step, which compounds across many small workers handling bursty traffic.

The runtime ships as a Lambda custom runtime layer wrapping the Bun binary, exposing the Lambda Runtime API. Workers stay as small TypeScript files; the runtime handles the bootstrap.

Per-channel resilience

Each channel queue has its own retry policy and DLQ, so a Slack outage doesn't back-pressure email delivery. Webhooks have signed payloads and per-target backoff so a flaky consumer doesn't block traffic to healthy ones.

Trade-offs

Custom Lambda runtime instead of stock Node.js. Custom runtimes carry ownership cost — security updates, runtime bugs, opaque error modes. The benefit is faster cold starts (notifications are bursty), zero build pipeline (TypeScript runs as-is), and a smaller per-function package. Across dozens of small workers, the wins add up.

SNS + per-channel SQS instead of one queue. A single queue would be simpler operationally. The benefit of fan-out is per-channel isolation — failure in one channel doesn't slow others, channels can be added/removed independently, and per-channel retry tuning is possible.

Logical intents instead of pre-rendered messages. Producers publishing "notify Y of X" rather than "send THIS Slack message to THAT channel" pushes formatting and recipient resolution into Notifier. The benefit is that channel-specific logic — mentions, threading, escalation policies, opt-outs — lives in one place instead of duplicated across every producer service.

Outcome

Centralization. Channel credentials, rate limits, and retry behavior live in one service instead of every microservice.
Channel isolation. A Slack outage stops Slack, not email. A flaky webhook target doesn't stall the platform.
Speed of adding channels. A new channel becomes a queue + worker, with zero changes to producers.
TypeScript without ceremony. The Bun runtime removes the build step, so the service stays close to plain TypeScript files — easier to read, faster to change.