
A DevOps interview rarely tests memorized definitions. Interviewers want to see whether you can reason about pipelines, debug a cluster under pressure, and explain trade-offs the way you would to a teammate at 2 a.m. during an incident. This guide collects real DevOps interview questions grouped by the categories most loops cover: CI/CD, containers and Kubernetes, infrastructure as code, monitoring and observability, Linux and networking, and behavioral. Each question includes a short answer or a note on what the interviewer is actually probing for.
Read these as conversation starters, not flashcards. The best preparation is being able to talk through *why* you would choose one approach over another. Where you have hands-on experience, anchor your answer in a concrete project. Where you do not, say so and reason from first principles — interviewers respect honesty over bluffing far more than candidates expect.
How DevOps interview questions are weighted by level
Before the questions, it helps to know what gets emphasized at each seniority band. The same topic surfaces at every level, but the depth expected shifts.
| Topic area | Junior focus | Mid-level focus | Senior / staff focus |
|---|---|---|---|
| CI/CD | Pipeline stages, what a build does | Caching, parallelism, deployment strategies | Pipeline-as-platform, multi-team standards |
| Kubernetes | Pods, deployments, services | Probes, resource limits, debugging crashes | Scaling, networking, cluster cost and security |
| Infrastructure as Code | Writing basic Terraform | State, modules, drift | Multi-account/region design, policy enforcement |
| Observability | Reading dashboards and logs | Defining SLIs and alerts | SLO strategy, reducing alert fatigue |
| Incident response | Following a runbook | Driving a fix | Owning postmortems and prevention |
Use this table to calibrate. If you are interviewing for a senior role, do not stop at definitions — push toward trade-offs, blast radius, and what you would do differently next time.
CI/CD pipeline questions
CI/CD is the heart of DevOps, so expect this category early and often. Continuous integration and continuous delivery automate the path from a code commit to a running release.
What is the difference between continuous delivery and continuous deployment?
Continuous delivery means every change that passes the pipeline is *ready* to ship, but a human still approves the final push to production. Continuous deployment removes that gate — every passing change goes live automatically. The distinction matters: deployment requires far stronger automated testing and rollback confidence. Interviewers ask this to check that you understand the human and risk dimensions, not just the acronyms.
Walk me through a typical CI/CD pipeline you have built.
Describe the real stages: trigger on commit or pull request, install dependencies, run unit and integration tests, build an artifact or container image, scan for vulnerabilities, push to a registry, then deploy to staging and production. A Jenkins pipeline or GitHub Actions workflow expresses these as discrete stages. Mention how you handle failures — fail fast, surface logs, and never deploy on a red build.
How do you handle secrets in a pipeline?
Never hard-code them. Pull secrets at runtime from a manager like Vault, AWS Secrets Manager, or the CI system's encrypted variable store, and scope them to the narrowest job that needs them. Rotate regularly and keep them out of logs. The interviewer wants to hear that secrets never land in source control or build output.
A deployment failed in production. How do you roll back?
The clean answer is that rollback should be a defined, tested action — not improvisation. Depending on your strategy, that means redeploying the previous artifact, flipping traffic back in a blue-green setup, or scaling the old version up while the new one drains. Stress that you would also capture what failed so the fix is permanent. The signal here is whether you treat rollback as a first-class capability.
Containers and Kubernetes questions
Containerization is foundational, and Kubernetes is the orchestration default. Be ready to go deep.
What problem do containers solve?
Docker packages an application with its dependencies into a portable image that runs identically across laptops, CI runners, and production. This kills the "works on my machine" class of bug and makes deployments reproducible. A strong answer also distinguishes containers from VMs: containers share the host kernel and are lighter, while VMs virtualize the full hardware stack.
What is the difference between a Docker image and a container?
An image is the immutable, layered template; a container is a running instance of that image. You can start many containers from one image. Interviewers ask this to confirm you grasp the build-versus-run distinction — it underpins everything from caching to registries.
Explain the core Kubernetes objects you work with.
Cover pods (the smallest deployable unit, one or more containers), deployments (declarative management and rolling updates of pod replicas), services (stable networking and load balancing to pods), and ConfigMaps and Secrets (configuration). The Kubernetes overview frames this as a declarative system: you describe desired state and the control loop reconciles reality toward it. Lead with that reconciliation idea — it is the mental model that separates strong candidates.
A pod is stuck in CrashLoopBackOff. How do you debug it?
Walk through your actual triage: kubectl describe pod to see events and the restart reason, then kubectl logs (including --previous for the crashed container) to read the failure. Common causes are a bad command, a missing config or secret, a failing readiness or liveness probe, or out-of-memory kills. The interviewer cares about your systematic process, not the single right answer.
What is the difference between liveness and readiness probes?
A liveness probe tells Kubernetes whether to restart a container; a readiness probe tells it whether to send traffic. A container can be alive but not ready — for example, still warming a cache. Misconfiguring these causes either traffic to broken pods or needless restart loops, so this question filters for real operational experience.
Infrastructure as Code questions
Managing infrastructure declaratively is now table stakes. Terraform is the most common subject.
What is infrastructure as code and why does it matter?
It means provisioning and managing infrastructure through versioned, declarative files rather than manual console clicks. The payoffs are reproducibility, peer review through pull requests, an audit trail, and the ability to recreate an environment from scratch. Frame it as bringing software-engineering discipline to operations.
What is Terraform state and why does it matter?
State is Terraform's record of what real infrastructure maps to your configuration. Without it, Terraform cannot know what to create, update, or destroy. In a team, you store state remotely (for example in S3 with locking) so people do not clobber each other. Mishandling state — losing it or running concurrent applies — is a classic outage cause, which is exactly why this gets asked.
How do you handle configuration drift?
Drift is when real infrastructure diverges from code, usually because someone made a manual change. Detect it with terraform plan, which shows the diff, and resolve it by either importing the change into code or reverting it. The deeper answer is prevention: lock down console write access so code is the only path to change.
What is the difference between Terraform and a configuration tool like Ansible?
Terraform provisions infrastructure declaratively and tracks state; Ansible is procedural and excels at configuring software *inside* machines after they exist. They are complementary — many teams use Terraform to stand up servers and Ansible or a startup script to configure them. Showing where each fits signals architectural maturity.
Monitoring and observability questions
You cannot operate what you cannot see. Expect questions about metrics, logs, traces, and alerting.
What is the difference between monitoring and observability?
Monitoring tells you *whether* a known problem is happening — is the disk full, is latency high? Observability is the broader property of being able to ask *new* questions about your system's internal state from its outputs, so you can debug failures you did not anticipate. The three pillars usually cited are metrics, logs, and traces.
How would you design alerting that does not exhaust the on-call engineer?
Alert on symptoms users feel, not every internal blip. Tie alerts to service-level objectives, set sensible thresholds, and make every page actionable with a clear runbook. Prometheus is a common metrics and alerting backend here. The interviewer is probing for judgment — alert fatigue is a real operational failure, and naming it shows you have been on call.
What is an SLI, SLO, and error budget?
An SLI is the measured indicator (for example, request success rate). An SLO is the target you commit to (99.9% success). The error budget is the allowed shortfall — the 0.1% you can "spend" on risk like shipping features faster. When the budget is exhausted, you slow down and prioritize reliability. This vocabulary signals familiarity with modern SRE practice.
Linux, networking, and Git questions
DevOps runs on Linux and Git, so foundational fluency is assumed.
How do you investigate a server that is running out of memory?
Talk through the tools: top or htop for live usage, free -h for a snapshot, and ps sorted by memory to find the offender. Then reason about whether it is a leak, undersized host, or a workload that needs limits. Demonstrating a calm, ordered diagnostic flow matters more than naming every flag.
Explain what happens when you type a URL and press enter.
This classic networking question checks depth: DNS resolution to an IP, a TCP connection (and TLS handshake for HTTPS), the HTTP request, server processing, and the response rendering. You do not need every detail, but a coherent end-to-end story shows you understand the stack your services run on.
What is the difference between git merge and git rebase?
Merge combines branches and preserves the full history with a merge commit; rebase replays your commits on top of another branch for a linear history. The rule of thumb: rebase local work to keep it tidy, but never rebase shared branches others have pulled. Knowing *when* to use each — not just the definitions — is the real signal.
Behavioral and scenario questions
DevOps is a collaboration role by definition. As AWS describes it, DevOps is as much a cultural philosophy of shared responsibility as it is a toolset, and the origin of the term reflects that goal of breaking down silos between development and operations.
Tell me about a production incident you handled.
Use a structured format like STAR (Situation, Task, Action, Result). Describe what broke, how you detected it, what you did to mitigate, and — most importantly — what you changed afterward to prevent a recurrence. Blameless framing and a concrete prevention step are what land this answer. For deeper structure, see our behavioral interview guide.
How do you handle a disagreement with a developer about a deployment?
Show that you lead with shared goals — reliability and shipping value — rather than territory. Bring data, propose a low-risk path like a canary or feature flag, and document the decision. Interviewers want collaborators, not gatekeepers who block teams.
How would you design a CI/CD system for a team migrating from manual deploys?
This open-ended scenario tests system thinking. Start by understanding current pain, then propose incremental steps: version control discipline, automated tests, a build pipeline, staging, and finally automated production deploys with rollback. Emphasize bringing the team along culturally, not just the tooling. This overlaps heavily with broader system design interview questions, so review those too.
The preparation edge most candidates miss
Most candidates over-index on tooling trivia and under-index on the interviewer. Knowing the team's actual stack, recent incidents, and what the hiring manager cares about lets you tailor every answer — and ask sharper questions back.
That intelligence is exactly what Articuler gives you. Drawing on 980M+ professional profiles and semantic matching, it finds the hiring manager or team lead behind the role, builds a Playbook on what they care about, and helps you reach them with AI-personalized outreach that lands roughly 8x the reply rate of the typical 5–8% cold message. Walking into a DevOps loop already understanding the team's priorities turns generic answers into ones that sound like you already work there. Pair it with Articuler's meeting prep before the call.
Conclusion
Strong DevOps interview answers share a pattern: they explain a trade-off, ground it in real operational experience, and stay calm under scenario pressure. Cover the categories here — CI/CD, containers and Kubernetes, infrastructure as code, observability, Linux and Git, and behavioral — and practice talking through each out loud rather than reciting. Then do the work most candidates skip: research the specific team you are about to meet. For more cloud-specific drilling, work through our AWS interview questions next.
FAQ
How should I prepare for a DevOps interview?
Map your prep to the categories above and build a short story for each major tool you have used, including one incident you handled. Practice explaining trade-offs aloud, set up a small home-lab project if you lack production depth, and research the company's stack so you can tailor answers. A mock interview tool helps you rehearse delivery.
How long does a DevOps interview process usually take?
A full loop typically runs two to four rounds over two to four weeks: an initial screen, one or two technical deep-dives (often including a hands-on or scenario exercise), and a behavioral or hiring-manager conversation. Senior roles add a system-design or architecture round. Timelines vary widely by company size.
What is the most common DevOps interview mistake?
Reciting definitions without trade-offs. Saying "Kubernetes orchestrates containers" earns nothing; explaining *when* you would choose it over a simpler platform and what it costs you operationally earns a lot. The second most common mistake is bluffing on something you have not used instead of reasoning honestly from fundamentals.
Do I need to code in a DevOps interview?
Often, yes — but usually scripting and configuration rather than algorithm puzzles. Expect to write or debug a Bash or Python script, a Dockerfile, a Kubernetes manifest, or Terraform. Some companies do include a coding round, so reviewing general technical interview questions is worthwhile.