Context: when “nothing changed” but the system still breaks

We were running a Django backend on Kubernetes, fronted by Gunicorn with multiple workers.
No major deployments. No architectural changes. Traffic was growing steadily, without unusual spikes.

Then a pattern started to emerge: pod memory usage kept increasing over time.
Not tied to specific requests. No sudden jumps — just a slow, linear climb.

Eventually, Kubernetes did exactly what it was designed to do:
OOMKilled → pod restart.

The system came back up. Memory dropped to zero.
And then the cycle repeated.

The initial question seemed simple:

“Where is the memory leak?”


Initial assumptions (and why they were wrong)

Like most backend teams, our first instinct was to look at the code:

  • Are objects not being garbage collected?
  • Are queries returning excessively large datasets?
  • Is there an in-process cache holding references too long?
  • Is there a logic bug in request handlers?

We audited the code, enabled debugging, and inspected profilers.
But one detail didn’t add up:

Memory was growing with time, not with user behavior.

Restarting the pod reset memory to zero.
Without restarts, memory never went down.

This didn’t look like a specific logic bug.
It looked much more like a lifecycle problem than a business logic problem.


Three layers that must be distinguished (and are often conflated)

At this point, the question was no longer “is there a memory leak,”
but rather at which layer is the leak happening.

1. Gunicorn — the process layer

Gunicorn is responsible for:

  • Worker lifecycle
  • Request lifecycle
  • Copy-on-write behavior when preloading the app
  • Recycling workers via max_requests

But Gunicorn does not understand containers.
It knows nothing about cgroups or memory limits.


2. Kubernetes — the container layer

Kubernetes manages:

  • Pod lifecycle
  • cgroup memory
  • OOMKill behavior
  • Restart policies

But Kubernetes does not understand Python processes.
It cannot distinguish between a “clean” worker and a “dirty” one.
When memory limits are exceeded, it kills the entire pod.


3. Application — the code layer

The application itself may:

  • Hold in-process caches
  • Maintain global state
  • Use third-party libraries with opaque memory behavior

But the application does not control lifecycle decisions unless explicitly designed to.


The wrong question vs. the right question

The wrong question (very common)

“Should we restart via Kubernetes or via Gunicorn to fix memory leaks?”

This question already assumes that restart is the solution.


The right question (what we learned later)

“Who should own the memory lifecycle?”

  • Kubernetes excels at managing containers
  • Gunicorn excels at managing processes
  • The application should focus on business logic

Letting Kubernetes “fix” a process-level problem by restarting pods
is like using a sledgehammer to fix a wristwatch.


The technical decision: recycle workers, not pods

We made a deliberate choice:

  • Stop relying on pod restarts to handle memory growth
  • Proactively recycle Gunicorn workers using:

  • max_requests

  • max_requests_jitter

The goal was not to “eliminate leaks,” but rather:

Never allow a worker to live long enough to become dangerous.

Why this worked better

  • Worker restarts:

  • Free memory for that specific process

  • Do not affect other workers
  • Pods stay alive:

  • No full cache loss

  • No mass cold starts
  • Smaller blast radius when things go wrong

Trade-offs (there are always trade-offs)

This approach is not free:

  • Recycling workers means losing in-process cache
  • Cold starts happen slightly more often
  • Some requests see higher latency when a new worker spins up

But in return:

  • Memory usage becomes predictable
  • System behavior stabilizes
  • Surprise pod restarts disappear
  • Incident frequency drops significantly

This is a controlled trade-off, not a gamble.


Second-order effects we didn’t expect

Over time, the benefits went beyond memory stability:

  • Debugging became easier with clear process lifecycles
  • Monitoring became more meaningful (no more “fake” memory resets)
  • The team stopped treating restarts as fixes
  • We started asking better questions:
    “Why does this worker need more memory than others?”

In other words,
the right solution forced us to look deeper, instead of hiding behind restarts.


The scars we carry

Looking back, we made several mistakes:

  • We trusted Kubernetes too early
  • We confused “can restart” with “should restart”
  • We used operational mechanisms to mask design issues

Kubernetes is powerful.
But it does not replace thinking about responsibility boundaries.


A checklist before you reach for “restart”

If you’re facing a similar issue, ask yourself:

  • Does memory grow with requests or with time?
  • Are workers being proactively recycled?
  • Is restart fixing the cause, or merely resetting the symptom?
  • If traffic increases 10×, what breaks first?
  • Who owns lifecycle decisions: the app, the process, or the container?

If you can’t answer these questions,
restart is just false confidence.


Conclusion

When systems are small, restart feels like a solution.
As systems grow, restart becomes the smell of a design problem.

The most dangerous failures don’t live in code.
They live at the boundaries between layers — where no one truly owns the responsibility.


Personal note

This article is not saying “everyone should configure Gunicorn this way.”
It is a reminder that:

You must ask the right questions before choosing the tool.

💬 Bình luận