Bug 1364243 - Terminating Pod does not get rescheduled to another node when node is NotReady
Summary: Terminating Pod does not get rescheduled to another node when node is NotReady
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: Vikas Laad
URL:
Whiteboard:
: 1343157 1365657 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-04 19:19 UTC by Vikas Laad
Modified: 2017-03-08 18:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-18 12:51:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Vikas Laad 2016-08-04 19:19:59 UTC
Description of problem:
Pod got stuck in Terminating state see https://bugzilla.redhat.com/show_bug.cgi?id=1364176

Docker was not responding on that node, so I did a reboot. Now node is not coming back because of possibly https://bugzilla.redhat.com/show_bug.cgi?id=1362109

Since the node is now showing in NotReady state pod should be rescheduled to another node, but its not happening.

root@300node-support-2: ~/svt/openshift_scalability # oc get pods --all-namespaces -o wide
NAMESPACE           NAME                          READY     STATUS        RESTARTS   AGE       IP            NODE 
clusterproject266   deploymentconfig2v0-1-9us8s   1/1       Terminating   0          1d        172.21.5.5    192.1.1.63 


root@300node-support-2: ~/svt/openshift_scalability # oc get nodes | grep 192.1.1.63  
192.1.1.63    NotReady                   6d      


Version-Release number of selected component (if applicable):
openshift v3.3.0.10 
kubernetes v1.3.0+57fb9ac  
etcd 2.3.0+git 

How reproducible:


Steps to Reproduce:
1. Pod is terminating and if the node becomes NotReady

Actual results:
Pod is stuck in Terminating state and Project does not get deleted.

Expected results:
Pod should be rescheduled to another Ready node

Additional info:

Comment 1 Andy Goldstein 2016-08-04 19:29:25 UTC
If you wait > 5 minutes, does the DeploymentConfig create a new pod on another node?

Comment 2 Vikas Laad 2016-08-04 19:55:52 UTC
No, this Terminating pod is stuck for a day. Node became NotReady for few hours now, still it was not creating on another node.

Comment 3 Andy Goldstein 2016-08-04 20:02:11 UTC
Derek would you mind looking at this? I think this may reproduce on a multi-node cluster by just stopping Docker on one node and waiting >5 minutes to see if the NodeController evicts the pods on the NotReady node.

Comment 4 Andy Goldstein 2016-08-04 20:03:44 UTC
I do want to clarify that pods never get rescheduled. If you have a scalable resource (replication controller, deployment config), that will attempt to create new pods to replace failed ones, but a pod by itself is never moved or rescheduled. Just wanted to make sure that's clear :-)

Comment 5 Derek Carr 2016-08-12 15:07:12 UTC
*** Bug 1365657 has been marked as a duplicate of this bug. ***

Comment 6 Derek Carr 2016-08-12 20:25:01 UTC
To summarize the full set of discussion topics in this thread:

1. The kubelet will wait 5 minutes before transitioning from a Ready to NotReady state if the kubelet container runtime goes down.  I think this time is too long, and its not tunable by operators since its hard-coded.  

See upstream issue to try and come to a consensus: 
https://github.com/kubernetes/kubernetes/issues/30534

2. The node controller does not evict a Pod if its in Terminating state, and its the ONLY pod scheduled to that node that required eviction.  This is because the node controller identifies that the nodes on the pod should be evicted, but because its the only pod on the node, and it has a TerminationGracePeriodSeconds, the current logic skips a delete on it, and it never goes into the terminating evictor queue.

See upstream issue to try and determine how to refactor:
https://github.com/kubernetes/kubernetes/issues/30536

The operator can forcefully delete the pod in question by doing:
$ oc delete pods <pod> --grace-period=0

Given this is an edge-case, and its fix requires a larger refactor, I am marking this UPCOMING_RELEASE and hope to get fixes into kubernetes 1.4 to be picked up by OpenShift upon that rebase.

Comment 7 Derek Carr 2016-08-15 17:23:23 UTC
*** Bug 1343157 has been marked as a duplicate of this bug. ***

Comment 8 Derek Carr 2016-08-18 15:44:20 UTC
Upstream PR for node controller not removing terminating pods from a node if it was the only pod on the node:  

https://github.com/kubernetes/kubernetes/pull/30624

Comment 9 Derek Carr 2016-08-18 18:01:48 UTC
Origin PR
https://github.com/openshift/origin/pull/10503

Comment 10 Derek Carr 2016-09-30 14:35:11 UTC
This should be fixed as the requisite origin pr has merged above.

Comment 11 Vikas Laad 2016-10-28 16:16:58 UTC
Tested with following scenario
- Created 2 nodes cluster
- Created projects which has pods on both the nodes
- Stopped docker on one of the nodes
- Deleted projects immediately
- Node becomes NotReady and Pods stay in Terminating state (It was stuck in this state)
- After few minutes Pods are gone, Node is Still in NotReady state
- Stared docker back on that node, Node is Ready and everything is good.

Comment 12 Vikas Laad 2016-10-28 16:18:10 UTC
Verified in following version

openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 14 errata-xmlrpc 2017-01-18 12:51:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.