About
Senior Software Engineer with 12 years at Google, Zing Health, Ambience Health, and Salt AI — spanning ML infrastructure, healthcare platforms, clinical AI, and AI workflow orchestration.
What I build
Orchestration layers, async task pipelines, EHR integrations, and the monitoring that tells you when something breaks. I've built these systems at Google-scale, in Medicare Advantage healthcare, and at AI infrastructure startups.
Why AI + Healthcare
Healthcare IT has real stakes — a broken clinical workflow means a physician can't see a patient's chart. AI makes these problems interesting in a new way, but making those tools reliable in production is serious engineering. That's the part I focus on.
Where I do my best work
Early-stage or scaling startup
End-to-end ownership, direct impact.
Healthcare or high-stakes domain
Where reliability has real consequences.
AI infrastructure
Making ML models useful in production.
Cross-functional teams
Engineers, clinicians, and researchers together.

David Chang
Senior Software Engineer
How I Design Systems
Architecture is about decisions, not tools.
01
Failure-first design
Before sketching the happy path, I ask: what does this look like when it fails? Idempotency, retry semantics, and dead-letter handling are part of the initial design — not afterthoughts.
02
Simple over clever
Celery + Redis solves 90% of async task problems without Kafka's operational overhead. I pick the tool that fits the actual problem, not the most impressive one.
03
Observability by design
A system you cannot see is a system you cannot trust. Distributed tracing, structured logging, and meaningful metrics are designed in from the start.
System Design Evolution
v1 · Prototype
BrittleSynchronous. Single process. No retry logic.
Failure mode: Fails silently. Blocks on slow tasks.
v2 · Async
BetterCelery + Redis. Tasks queued asynchronously. Basic retries.
Failure mode: No idempotency. Partial failures corrupt state.
v3 · Production
Production-gradeIdempotent tasks. Checkpointed state. DLQs. Prometheus. K8s isolation.
All failure modes addressed.
Key Tradeoffs
Option A
vs Option B
I choose
Because
Celery + Redis
Kafka
Celery
Simpler operations, built-in retry primitives, sufficient throughput for workflow fan-out
FastAPI
Django REST
FastAPI
Async-first, lower latency, auto OpenAPI docs — fits orchestration API patterns
Idempotent tasks
Stateful execution
Idempotent
Turns hardware failures into cheap retries instead of corrupted state
Canary rollout
Blue-green deploy
Canary
Detects quality regressions under real traffic before full rollout
K8s namespaces
Separate VMs
Namespaces
Strong isolation guarantees with shared cluster infrastructure — lower cost
OpenTelemetry
Log correlation
OTel
End-to-end trace spans across microservices — find failures in minutes
Design philosophy
“The best systems are boring. Simple components, clear failure boundaries, and good observability. Save cleverness for the problem you're actually trying to solve.”