Bug 1541476 - Pods in crash loop backoff can't be deleted until the crash loop backoff period expires
Summary: Pods in crash loop backoff can't be deleted until the crash loop backoff peri...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.10.0
Assignee: Robert Krawitz
QA Contact: weiwei jiang
Depends On:
TreeView+ depends on / blocked
Reported: 2018-02-02 16:44 UTC by Clayton Coleman
Modified: 2018-07-30 19:09 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2018-07-30 19:09:00 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:09:34 UTC

Description Clayton Coleman 2018-02-02 16:44:58 UTC
It looks like the kubelet refuses to finalize the deletion of a pod while the pod is in crash loop backoff (in the backoff window), probably because it's waiting and holding the sync loop blocked.


1. Create a pod that crash loops
2. Wait until the backoff is > 1m (3-4 crashes)
3. Delete the pod


1. Kubelet acknowledges the delete request immediately and cleans up the pod (the main container would be stopped already, so delete should be almost instantaneous)


1. Pod sits in terminating for +1m (looks like until the backoff period expires)

This bug is incredibly annoying when debugging or working with pods.  It makes me want to break things, which is sad :(

Comment 1 Seth Jennings 2018-02-02 16:46:50 UTC
Upstream issue:

Comment 2 Seth Jennings 2018-02-07 21:47:47 UTC
Still working to run this down upstream

The delay is only a factor of the terminationGracePeriod (30s), not the backoff timeout (up to 5m).  So we are looking at a delay in the 10s of seconds, not minutes.

Trying to figure out why the kubelet does not clean up the failed container once the pod gets its deletionTimestamp set.

Comment 3 Seth Jennings 2018-02-13 16:02:00 UTC
This has been an issue since at least 1.6 according to the upstream issue so it isn't a regression.  It is annoying and effects pods in general, not just StatefulSet pods.  Not a blocker in my mind though, so deferring to z-stream.

Clayton, if you really want this to be a blocker, feel free to move it back.

Comment 4 Seth Jennings 2018-04-10 03:50:31 UTC
WIP upstream PR:

Comment 5 Seth Jennings 2018-05-01 18:00:17 UTC
Previous upstream PR abandon.

New upstream PR:

Origin PR:

Comment 7 weiwei jiang 2018-05-16 09:03:46 UTC
Checked on 
# oc version 
oc v3.10.0-0.46.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-14-127.ec2.internal:8443
openshift v3.10.0-0.46.0
kubernetes v1.10.0+b81c8f8

And the terminating pods is deleted immediately now.

Comment 9 errata-xmlrpc 2018-07-30 19:09:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.