Gunicorn, Kubernetes, and the Memory Leak Story: Why Restarting Doesn’t Fix the Problem

Context: when “nothing changed” but the system still breaks

We were running a Django backend on Kubernetes, fronted by Gunicorn with multiple workers.
No major deployments. No architectural changes. Traffic was growing steadily, without unusual spikes.

Then a pattern started to emerge: pod memory usage kept increasing over time.
Not tied to specific requests. No sudden jumps — just a slow, linear climb.

Eventually, Kubernetes did exactly what it was designed to do:
OOMKilled → pod restart.

The system came back up. Memory dropped to zero.
And then the cycle repeated.

The initial question seemed simple:

“Where is the memory leak?”

Initial assumptions (and why they were wrong)

Like most backend teams, our first instinct was to look at the code:

Are objects not being garbage collected?
Are queries returning excessively large datasets?
Is there an in-process cache holding references too long?
Is there a logic bug in request handlers?

We audited the code, enabled debugging, and inspected profilers.
But one detail didn’t add up:

Memory was growing with time, not with user behavior.

Restarting the pod reset memory to zero.
Without restarts, memory never went down.

This didn’t look like a specific logic bug.
It looked much more like a lifecycle problem than a business logic problem.

Three layers that must be distinguished (and are often conflated)

At this point, the question was no longer “is there a memory leak,”
but rather at which layer is the leak happening.

1. Gunicorn — the process layer

Gunicorn is responsible for:

Worker lifecycle
Request lifecycle
Copy-on-write behavior when preloading the app
Recycling workers via max_requests

But Gunicorn does not understand containers.
It knows nothing about cgroups or memory limits.

2. Kubernetes — the container layer

Kubernetes manages:

Pod lifecycle
cgroup memory
OOMKill behavior
Restart policies

But Kubernetes does not understand Python processes.
It cannot distinguish between a “clean” worker and a “dirty” one.
When memory limits are exceeded, it kills the entire pod.

3. Application — the code layer

The application itself may:

Hold in-process caches
Maintain global state
Use third-party libraries with opaque memory behavior

But the application does not control lifecycle decisions unless explicitly designed to.

The wrong question vs. the right question

The wrong question (very common)

“Should we restart via Kubernetes or via Gunicorn to fix memory leaks?”

This question already assumes that restart is the solution.

The right question (what we learned later)

“Who should own the memory lifecycle?”

Kubernetes excels at managing containers
Gunicorn excels at managing processes
The application should focus on business logic

Letting Kubernetes “fix” a process-level problem by restarting pods
is like using a sledgehammer to fix a wristwatch.

The technical decision: recycle workers, not pods

We made a deliberate choice:

Stop relying on pod restarts to handle memory growth
Proactively recycle Gunicorn workers using:
max_requests
max_requests_jitter

The goal was not to “eliminate leaks,” but rather:

Never allow a worker to live long enough to become dangerous.

Why this worked better

Worker restarts:
Free memory for that specific process
Do not affect other workers
Pods stay alive:
No full cache loss
No mass cold starts
Smaller blast radius when things go wrong

Trade-offs (there are always trade-offs)

This approach is not free:

Recycling workers means losing in-process cache
Cold starts happen slightly more often
Some requests see higher latency when a new worker spins up

But in return:

Memory usage becomes predictable
System behavior stabilizes
Surprise pod restarts disappear
Incident frequency drops significantly

This is a controlled trade-off, not a gamble.

Second-order effects we didn’t expect

Over time, the benefits went beyond memory stability:

Debugging became easier with clear process lifecycles
Monitoring became more meaningful (no more “fake” memory resets)
The team stopped treating restarts as fixes
We started asking better questions:
“Why does this worker need more memory than others?”

In other words,
the right solution forced us to look deeper, instead of hiding behind restarts.

The scars we carry

Looking back, we made several mistakes:

We trusted Kubernetes too early
We confused “can restart” with “should restart”
We used operational mechanisms to mask design issues

Kubernetes is powerful.
But it does not replace thinking about responsibility boundaries.

A checklist before you reach for “restart”

If you’re facing a similar issue, ask yourself:

Does memory grow with requests or with time?
Are workers being proactively recycled?
Is restart fixing the cause, or merely resetting the symptom?
If traffic increases 10×, what breaks first?
Who owns lifecycle decisions: the app, the process, or the container?

If you can’t answer these questions,
restart is just false confidence.

Conclusion

When systems are small, restart feels like a solution.
As systems grow, restart becomes the smell of a design problem.

The most dangerous failures don’t live in code.
They live at the boundaries between layers — where no one truly owns the responsibility.

Personal note

This article is not saying “everyone should configure Gunicorn this way.”
It is a reminder that:

You must ask the right questions before choosing the tool.