System Design Interview Questions (With Answers for Mid–Senior Engineers)

System design interviews are the make-or-break round for mid-to-senior software engineers. They test not just what you know, but how you think through ambiguous problems at scale. Most candidates don't fail on raw technical knowledge — they fail because they jump into drawing boxes before clarifying scope, or name-drop technologies without explaining trade-offs.

Here's what this guide covers:

A 6-step framework for structuring every answer
4 common questions with concise walkthrough answers
Key concepts interviewers probe (CAP theorem, load balancing, CDN, message queues)
What interviewers actually score — and the most common mistakes

The 6-Step Answer Framework

Interviewers at Google, Meta, Amazon, and most Series B+ startups evaluate system design using roughly the same rubric. A consistent structure signals seniority more reliably than any single technical choice.

1. Clarify requirements (3–5 min)

Restate the problem and ask targeted questions before touching the whiteboard. "Should reads be strongly consistent, or can we serve slightly stale data?" is a question that shapes the entire architecture. Candidates who skip this lose scoring points before they've drawn a single component.

Define: functional requirements (what the system does), non-functional requirements (scale, latency SLO, availability target), and what's explicitly out of scope.

2. Estimate capacity

Back-of-envelope math grounds your design in reality. For a URL shortener serving 100M requests/day:

Reads: ~1,160 req/sec
Writes: ~10% of reads → ~116 req/sec
Storage: 500 bytes × 100M URLs = ~50 GB over 5 years

These numbers don't need to be exact — they need to make your architecture choices look intentional, not arbitrary.

3. Define the data model and APIs

Sketch the core entities and their relationships before drawing any boxes. Schema-first thinking catches problems that architecture diagrams miss. Pick two or three key API endpoints and define the request/response shape.

4. Draw the high-level architecture

Now draw the components: clients, load balancer, application servers, databases, cache layer, CDN. Explain data flow as you draw. Every component you add should come with a sentence on why it's there.

5. Deep dive into 2–3 components

This is the highest-weighted dimension in most rubrics (~30% of the score). Pick the components most interesting to your specific design and go deep — how does the cache evict entries? How does the load balancer pick a server? Surface-level answers cap you at mid-level regardless of your title.

6. Discuss trade-offs

"I chose NoSQL because we need horizontal write scaling and can tolerate eventual consistency" beats "I chose MongoDB" every time. Name the alternative you considered and explain why you rejected it. Flag failure modes: what breaks first under peak load, and how do you detect it?

4 Common System Design Questions (With Brief Answers)

Design a URL Shortener (e.g., TinyURL)

What it tests: hashing, redirects, read-heavy scaling, database choice.

Clarify: read-to-write ratio (typically 10:1+), custom aliases needed, analytics required, link expiry?

Core design:

Component	Choice	Reason
Short code generation	Base62 hash (7 chars = 62⁷ ≈ 3.5T URLs)	Avoids collision at scale
Primary store	Relational DB (Postgres)	Strong consistency for `shortCode → longURL` mapping
Read cache	Redis in front of DB	10:1 read ratio; cache-hit rate >95% for popular links
Redirect	301 (permanent) vs 302 (temporary)	302 if you need click analytics; 301 reduces server load

Trade-off to name: Base62 hashing can generate the same code for different URLs. You'll need a collision-check loop — acceptable at low write rates, but at 10K writes/sec you'd consider pre-generating short codes offline and storing them in a ready pool.

Design a Social Media Feed (e.g., Twitter/X timeline)

What it tests: fan-out strategies, eventual consistency, read vs write trade-offs, caching at scale.

Clarify: are we optimizing for read speed or write simplicity? How many followers can a user have? Do celebrities (10M+ followers) need special handling?

Two fan-out models:

Fan-out on write (push): when a user posts, pre-compute and push the post ID into every follower's feed cache. Reads are O(1). But a celebrity post triggers millions of cache writes — expensive.
Fan-out on read (pull): compute the feed fresh on each read by merging followees' recent posts. Reads are slower but writes are trivially cheap.

Production answer: a hybrid. Most users get push. Celebrity accounts (>1M followers) use pull, then merge at read time. This is how Twitter and Instagram actually handle it.

Key storage: a timeline cache (Redis sorted set ordered by timestamp) per user. The underlying post store can be Cassandra — wide-column, write-optimized, and horizontally scalable.

Design a Rate Limiter

What it tests: distributed coordination, algorithm trade-offs, atomicity.

Clarify: per user, per IP, or per API key? Sliding window or fixed window? Should limits be globally consistent or per-node estimates OK?

Common algorithms:

Algorithm	Pros	Cons
Fixed window counter	Simple, fast	Burst at window edges (2× rate at boundary)
Sliding window log	Precise	High memory: stores every request timestamp
Token bucket	Smooth bursts allowed	Slightly more complex to implement

Implementation: store counters in Redis with a TTL matching the window. Use Lua scripts (or INCR + EXPIRE wrapped in a transaction) to make the increment-and-check atomic. A single Redis node works at moderate scale; at high scale, shard by user ID.

Trade-off to name: per-node counters (no shared Redis) are faster but allow users to exceed the limit if they hit multiple nodes. A good answer acknowledges the consistency/throughput trade-off explicitly.

Design a Distributed Cache

What it tests: consistent hashing, replication, eviction policies, cache coherence.

Clarify: read-heavy or write-heavy? Tolerate stale reads? How large is the dataset? Single-region or multi-region?

Core design decisions:

Partitioning: use consistent hashing to distribute keys across nodes. Adding or removing a node remaps only keys/n entries instead of all keys.
Replication: each key replicated to N nodes (typically N=3). Reads can go to any replica (eventual consistency) or the primary only (strong consistency).
Eviction: LRU (least recently used) is the default. LFU (least frequently used) works better for workloads with stable "hot" keys.
Cache invalidation: the hardest part. Write-through (write to cache and DB simultaneously) keeps cache fresh but adds write latency. Write-around (write to DB, invalidate cache) is simpler but causes a cache miss on the next read.

Trade-off to name: a cache that tolerates stale reads (e.g., product catalog) can use a longer TTL and async refresh. A cache that can't (e.g., account balance) needs write-through or cache-aside with strict invalidation — at the cost of added complexity.

Key Concepts Interviewers Probe

CAP Theorem

The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read returns the latest write), Availability (every request gets a response), and Partition tolerance (the system operates despite network splits).

In practice, partition tolerance is non-negotiable in any real distributed system — network failures happen. So the real choice is: during a partition, do you sacrifice consistency (AP system, e.g., Cassandra, DynamoDB in default mode) or availability (CP system, e.g., HBase, Zookeeper)?

Cite this when choosing a database. "We'll use Cassandra because our use case tolerates eventual consistency and needs high write availability" is a strong answer.

Load Balancing

Load balancing) distributes incoming traffic across multiple servers to prevent any single node from becoming a bottleneck. Common algorithms: round-robin (simple, ignores server load), least connections (routes to the server with fewest active requests), IP hash (ensures the same client always hits the same server — useful for session affinity).

Nginx is a widely used open-source load balancer and reverse proxy. At higher scale, hardware load balancers (F5, AWS ALB) sit in front of software ones.

Horizontal vs Vertical Scaling

Vertical scaling: give one server more CPU/RAM. Simple, but there's a ceiling, and it creates a single point of failure. Horizontal scaling: add more servers. Requires your application to be stateless (or externalize state to a shared store). Most modern systems prefer horizontal scaling for resilience and cost-efficiency.

SQL vs NoSQL

Factor	SQL (e.g., Postgres, MySQL)	NoSQL (e.g., DynamoDB, Cassandra, MongoDB)
Data structure	Fixed schema, relational	Flexible schema, document or wide-column
Consistency	Strong (ACID)	Eventual (tunable)
Scaling	Vertical primary; read replicas	Horizontal natively
Best for	Complex queries, joins, transactions	High write throughput, variable schema

CDN and Message Queues

A content delivery network caches static assets (images, JS, CSS) at edge nodes close to users, cutting latency for global traffic. Mention CDNs for any design with heavy static content or read-heavy global users.

A message queue (Kafka, RabbitMQ, SQS) decouples producers from consumers, enabling async processing and buffering traffic spikes. Use one when: a write can be processed later without blocking the user (email notifications, activity logs, feed updates for non-urgent followers).

How Interviewers Score System Design

Most companies evaluate five dimensions:

Requirements gathering (~15%) — did you clarify before designing? Skipping this loses points before you draw anything.
High-level architecture (~25%) — right components, correct data flow, each component justified.
Deep dive (~30%) — the highest-weighted area. Can you explain how a component works under the hood, not just that it exists?
Trade-off reasoning (~20%) — do you name alternatives and explain why you rejected them?
Communication (~10%) — does the interviewer understand your reasoning in real time? State every decision aloud before drawing it.

Level-specific expectations: An L4/SWE-III candidate who produces a clean high-level architecture with sound choices earns a strong score. An L6/Staff candidate who produces the same answer gets downleveled — at senior levels, interviewers expect proactive discussion of failure modes, cost implications, monitoring strategy, and multi-region considerations without being prompted.

Common Mistakes to Avoid

Jumping straight to the solution. The first 3–5 minutes of clarifying requirements are not a formality — they change the design.
Naming a technology without explaining why. "I'd use Kafka" means nothing without "because we need durable, ordered message delivery with replay capability and 100K+ events/sec."
Ignoring failure modes. What happens when the cache goes down? When a DB node fails? What's the alert threshold? This is what separates people who design systems on paper from people who've operated them in production.
No trade-offs. Every architectural choice is a trade-off. If you're not naming what you gave up, the interviewer assumes you don't know.
Spending too long on one part. Cover the full design first, then go deep on 2–3 components. A 45-minute interview spent entirely on the database schema is a miss.

Get the Most Out of System Design Prep

System design prep pairs well with targeted interview prep. If you're going for Microsoft or enterprise engineering roles, .NET interview questions covers the technical round expectations there too.

Cracking system design gets you past the technical bar. Getting the inside track — knowing your interviewers, understanding the team's priorities, and walking in prepared for *that* conversation — is a separate edge. Articuler's AI meeting prep builds a Playbook on any interviewer: background, recent work, what they care about, and tailored conversation starters. Before you ace the system design round, you can already know who's sitting across from you.

And if you want to skip the apply-and-pray funnel entirely, Articuler's people search lets you find the engineering manager or hiring manager directly — so you can reach out before the interview is even scheduled.

FAQ

How long is a typical system design interview?

Most system design interviews run 45–60 minutes. Aim to spend the first 5 minutes on requirements, 5 minutes on capacity estimates, 15 minutes on high-level design, and 15–20 minutes deep-diving into specific components. Save a few minutes at the end to revisit trade-offs.

Do I need to memorize exact numbers for capacity estimates?

No. Interviewers care that you can reason through scale, not that your numbers are precise. Know rough orders of magnitude: a typical web server handles ~10K req/sec, a relational DB ~10K reads/sec, RAM is ~100× faster than SSD. From there, derive the math live in the interview.

Should I use SQL or NoSQL in my answer?

It depends on the requirements you clarified. SQL (Postgres, MySQL) is the right default when you need ACID transactions or complex joins. NoSQL (Cassandra, DynamoDB, MongoDB) is better for high write throughput, variable schema, or horizontal scaling without complex queries. The key is explaining why, not just which.

How is system design different at senior vs mid-level?

Mid-level candidates are expected to produce a sound high-level design with reasonable component choices. Senior candidates are expected to proactively discuss failure modes, cost reasoning, monitoring strategy, and multi-region resilience — without being asked. The rubric is the same; the bar for each dimension is higher.