Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1364243

Summary: Terminating Pod does not get rescheduled to another node when node is NotReady
Product: OpenShift Container Platform Reporter: Vikas Laad <vlaad>
Component: NodeAssignee: Derek Carr <decarr>
Status: CLOSED ERRATA QA Contact: Vikas Laad <vlaad>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: agoldste, aos-bugs, jokerman, mmccomas, tdawson, vlaad, weliang, wmeng, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 12:51:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vikas Laad 2016-08-04 19:19:59 UTC
Description of problem:
Pod got stuck in Terminating state see https://bugzilla.redhat.com/show_bug.cgi?id=1364176

Docker was not responding on that node, so I did a reboot. Now node is not coming back because of possibly https://bugzilla.redhat.com/show_bug.cgi?id=1362109

Since the node is now showing in NotReady state pod should be rescheduled to another node, but its not happening.

root@300node-support-2: ~/svt/openshift_scalability # oc get pods --all-namespaces -o wide
NAMESPACE           NAME                          READY     STATUS        RESTARTS   AGE       IP            NODE 
clusterproject266   deploymentconfig2v0-1-9us8s   1/1       Terminating   0          1d        172.21.5.5    192.1.1.63 


root@300node-support-2: ~/svt/openshift_scalability # oc get nodes | grep 192.1.1.63  
192.1.1.63    NotReady                   6d      


Version-Release number of selected component (if applicable):
openshift v3.3.0.10 
kubernetes v1.3.0+57fb9ac  
etcd 2.3.0+git 

How reproducible:


Steps to Reproduce:
1. Pod is terminating and if the node becomes NotReady

Actual results:
Pod is stuck in Terminating state and Project does not get deleted.

Expected results:
Pod should be rescheduled to another Ready node

Additional info:

Comment 1 Andy Goldstein 2016-08-04 19:29:25 UTC
If you wait > 5 minutes, does the DeploymentConfig create a new pod on another node?

Comment 2 Vikas Laad 2016-08-04 19:55:52 UTC
No, this Terminating pod is stuck for a day. Node became NotReady for few hours now, still it was not creating on another node.

Comment 3 Andy Goldstein 2016-08-04 20:02:11 UTC
Derek would you mind looking at this? I think this may reproduce on a multi-node cluster by just stopping Docker on one node and waiting >5 minutes to see if the NodeController evicts the pods on the NotReady node.

Comment 4 Andy Goldstein 2016-08-04 20:03:44 UTC
I do want to clarify that pods never get rescheduled. If you have a scalable resource (replication controller, deployment config), that will attempt to create new pods to replace failed ones, but a pod by itself is never moved or rescheduled. Just wanted to make sure that's clear :-)

Comment 5 Derek Carr 2016-08-12 15:07:12 UTC
*** Bug 1365657 has been marked as a duplicate of this bug. ***

Comment 6 Derek Carr 2016-08-12 20:25:01 UTC
To summarize the full set of discussion topics in this thread:

1. The kubelet will wait 5 minutes before transitioning from a Ready to NotReady state if the kubelet container runtime goes down.  I think this time is too long, and its not tunable by operators since its hard-coded.  

See upstream issue to try and come to a consensus: 
https://github.com/kubernetes/kubernetes/issues/30534

2. The node controller does not evict a Pod if its in Terminating state, and its the ONLY pod scheduled to that node that required eviction.  This is because the node controller identifies that the nodes on the pod should be evicted, but because its the only pod on the node, and it has a TerminationGracePeriodSeconds, the current logic skips a delete on it, and it never goes into the terminating evictor queue.

See upstream issue to try and determine how to refactor:
https://github.com/kubernetes/kubernetes/issues/30536

The operator can forcefully delete the pod in question by doing:
$ oc delete pods <pod> --grace-period=0

Given this is an edge-case, and its fix requires a larger refactor, I am marking this UPCOMING_RELEASE and hope to get fixes into kubernetes 1.4 to be picked up by OpenShift upon that rebase.

Comment 7 Derek Carr 2016-08-15 17:23:23 UTC
*** Bug 1343157 has been marked as a duplicate of this bug. ***

Comment 8 Derek Carr 2016-08-18 15:44:20 UTC
Upstream PR for node controller not removing terminating pods from a node if it was the only pod on the node:  

https://github.com/kubernetes/kubernetes/pull/30624

Comment 9 Derek Carr 2016-08-18 18:01:48 UTC
Origin PR
https://github.com/openshift/origin/pull/10503

Comment 10 Derek Carr 2016-09-30 14:35:11 UTC
This should be fixed as the requisite origin pr has merged above.

Comment 11 Vikas Laad 2016-10-28 16:16:58 UTC
Tested with following scenario
- Created 2 nodes cluster
- Created projects which has pods on both the nodes
- Stopped docker on one of the nodes
- Deleted projects immediately
- Node becomes NotReady and Pods stay in Terminating state (It was stuck in this state)
- After few minutes Pods are gone, Node is Still in NotReady state
- Stared docker back on that node, Node is Ready and everything is good.

Comment 12 Vikas Laad 2016-10-28 16:18:10 UTC
Verified in following version

openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 14 errata-xmlrpc 2017-01-18 12:51:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066