AI for System Design Interviews: 6 Walkthroughs
Six classic system design prompts, solved live with the Sottos overlay watching the whiteboard. Includes the exact structure each walkthrough follows and what a defensible answer looks like.

TL;DR — System design interviews break candidates not on knowledge but on structure. This piece walks through six prompts using the same five-step scaffold the Sottos Design mode emits — clarify, scope, sketch, scale, defend. Each walkthrough takes five to seven minutes and ends with a follow-up answer you can give without the overlay.
How the loop usually breaks
Two failure modes dominate. First, the candidate dives into a component diagram before agreeing on scope. The interviewer pulls them back, time runs out. Second, the candidate names every distributed-systems primitive they've ever read about without explaining why — load balancer, cache, Kafka, sharding, all in one breath.
The fix is structural. The same five-step scaffold works on every prompt: clarify, scope, sketch, scale, defend. Sottos's Design mode emits cue cards in that order — your job is to drive the conversation, not narrate the cards.
1. Design a URL shortener (Bitly)
Clarify: read volume vs write volume, expected QPS, character set for short codes, link expiry semantics.
Scope: v1 covers shorten + redirect. Custom aliases, analytics dashboards, and link expiry are v2 unless the interviewer pulls them in.
Sketch: API gateway → shortener service (generates code, writes to KV) → redirect service (KV lookup, 301). Cache the hottest 1% in front of the KV.
Scale: code generation is the bottleneck — base62 of an auto-increment + hash collision check, or a pre-allocated pool. Discuss trade-offs out loud.
Defend: why KV and not Postgres? Random access, hot-key skew, eventual consistency tolerable on a redirect.
2. Design a distributed counter (likes / view counts)
Clarify: write QPS, read consistency tolerance, granularity (per-post or per-user-per-post).
Scope: single counter per entity, eventual consistency, no negative balances.
Sketch: front-line counters sharded by entity ID. Periodic flush to a durable store. Reads hit the local shard, missing reads fall back to the store.
Scale: hot-key (a viral post) collapses to one shard. Mitigate with key splitting + aggregated reads.
Defend: what if a shard fails between flush intervals? Replication factor, write-ahead log, accepting the trade between durability and write latency.
3. Design a news feed ranker
Clarify: pull vs push, max followee fan-out, ranking signals, refresh cadence.
Scope: v1 is pull-based with a precomputed candidate set per user. Ranking is offline-trained, online-scored.
Sketch: candidate generation → feature service → scoring service → final ranking. Caches at every layer.
Scale: tail latency on scoring dominates. Quantize features, batch infer, prewarm hot users.
Defend: the celebrity fan-out problem. Decouple celebrity posts into a separate path; merge at ranking time.
4. Design a real-time chat backend
Clarify: 1:1 vs group, presence + read receipts, max group size, offline delivery.
Scope: v1 is 1:1 chat with persistence and offline delivery. Groups under 256 members. Receipts asynchronous.
Sketch: WebSocket gateway → routing service → per-conversation queue → fan-out to recipients. Persist to a log; index for history.
Scale: connection density per server, sharding by conversation, ordering guarantees.
Defend: exactly-once vs at-least-once delivery. Why at-least-once + idempotent client wins for chat.
5. Design an API rate limiter
Clarify: per-user vs per-IP vs per-endpoint, burst tolerance, global vs regional limits.
Scope: per-user per-endpoint with a token bucket. Local to each gateway with eventual sync.
Sketch: gateway middleware reads + decrements a counter in the local cache. Replicated counter pushed to a central aggregator every N ms.
Scale: clock skew across gateways, double-counting at boundaries.
Defend: leaky bucket vs token bucket vs sliding window — the trade-off is bursty-allowed vs steady-rate.
6. Design search autocomplete
Clarify: prefix-only or fuzzy, ranking signals, personalization, max latency target.
Scope: prefix completion, ranked by popularity. Personalization is a v2 layer.
Sketch: trie or FST in memory per shard, periodically rebuilt from logs. Front-end debounce; backend completes in <50ms p99.
Scale: trending terms shift the top-K hourly. Rebuilds need to be safe under concurrent reads.
Defend: why a trie not a Bloom filter or a B-tree — operational simplicity at the read latency you need.
What the overlay does (and doesn't do)
Sottos's Design mode emits the cue cards: clarify, scope, sketch, scale, defend. It does not draw the diagram for you. Questions caught. Context held. Answers shaped while you stay present. The whiteboard is yours.
If the interviewer asks why this approach — close the overlay and answer from the cue card you read fifteen seconds ago. That's the defensibility test.
Frequently asked
Can AI really keep up with a live system-design conversation?
It can keep up. The constraint is whether you can stay in the conversation while glancing at structured cues. The five-step scaffold is short on purpose — five labels, never more than three bullets each.
How do I practice without a real interviewer?
Sottos's practice round mode replays each of these prompts with realistic interviewer follow-ups. Same overlay, no recruiter on the line. Free download.
What's the most common follow-up question in system design?
What breaks first as the system grows? — that's the single most common probe. Have an answer for each of your components.
Should I memorize a list of patterns?
Memorize five: sharding, replication, caching, queueing, eventual consistency. Everything else is composition. The overlay surfaces the rest at the moment you need it.
How long should each section take?
Clarify and scope under five minutes combined. Sketch ten. Scale ten. Defend five to ten. If the interviewer pulls you off-schedule, follow them — the structure is a scaffold, not a script.
Try the Design mode
Download Sottos — Free plan includes Design mode and the practice round library. How Sottos compares to other interview copilots.