It looks like the kubelet refuses to finalize the deletion of a pod while the pod is in crash loop backoff (in the backoff window), probably because it's waiting and holding the sync loop blocked.
1. Create a pod that crash loops
2. Wait until the backoff is > 1m (3-4 crashes)
3. Delete the pod
1. Kubelet acknowledges the delete request immediately and cleans up the pod (the main container would be stopped already, so delete should be almost instantaneous)
1. Pod sits in terminating for +1m (looks like until the backoff period expires)
This bug is incredibly annoying when debugging or working with pods. It makes me want to break things, which is sad :(
Still working to run this down upstream
The delay is only a factor of the terminationGracePeriod (30s), not the backoff timeout (up to 5m). So we are looking at a delay in the 10s of seconds, not minutes.
Trying to figure out why the kubelet does not clean up the failed container once the pod gets its deletionTimestamp set.
This has been an issue since at least 1.6 according to the upstream issue so it isn't a regression. It is annoying and effects pods in general, not just StatefulSet pods. Not a blocker in my mind though, so deferring to z-stream.
Clayton, if you really want this to be a blocker, feel free to move it back.
WIP upstream PR:
Previous upstream PR abandon.
New upstream PR:
# oc version
features: Basic-Auth GSSAPI Kerberos SPNEGO
And the terminating pods is deleted immediately now.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.