It looks like the kubelet refuses to finalize the deletion of a pod while the pod is in crash loop backoff (in the backoff window), probably because it's waiting and holding the sync loop blocked. Scenario: 1. Create a pod that crash loops 2. Wait until the backoff is > 1m (3-4 crashes) 3. Delete the pod Expect: 1. Kubelet acknowledges the delete request immediately and cleans up the pod (the main container would be stopped already, so delete should be almost instantaneous) Actual: 1. Pod sits in terminating for +1m (looks like until the backoff period expires) This bug is incredibly annoying when debugging or working with pods. It makes me want to break things, which is sad :(
Upstream issue: https://github.com/kubernetes/kubernetes/issues/57865#issuecomment-358183236
Still working to run this down upstream https://github.com/kubernetes/kubernetes/issues/57865 The delay is only a factor of the terminationGracePeriod (30s), not the backoff timeout (up to 5m). So we are looking at a delay in the 10s of seconds, not minutes. Trying to figure out why the kubelet does not clean up the failed container once the pod gets its deletionTimestamp set.
This has been an issue since at least 1.6 according to the upstream issue so it isn't a regression. It is annoying and effects pods in general, not just StatefulSet pods. Not a blocker in my mind though, so deferring to z-stream. Clayton, if you really want this to be a blocker, feel free to move it back.
WIP upstream PR: https://github.com/kubernetes/kubernetes/pull/62170
Previous upstream PR abandon. New upstream PR: https://github.com/kubernetes/kubernetes/pull/63321 Origin PR: https://github.com/openshift/origin/pull/19580
Checked on # oc version oc v3.10.0-0.46.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-14-127.ec2.internal:8443 openshift v3.10.0-0.46.0 kubernetes v1.10.0+b81c8f8 And the terminating pods is deleted immediately now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816