Hide Forgot
Description of problem: When doing a forceful node evacuation the pods get stuck in "Terminating" state. The RC will be creating new pods but they are not scheduled to any node (even with enough resources available at other nodes). They remain stuck at Pending scheduled to no node until the initially evicted node re-joins the cluster. Version-Release number of selected component (if applicable): OSE 3.1 How reproducible: Forcefully evacuate a node Steps to Reproduce: 1. oadm manage-node <nodeName> --evacuate --force Actual results: Pods are not started on other nodes Expected results: Pods should be started on other nodes
I attempted to recreate the reported issue, but had no luck. # openshift version openshift v3.1.1.6-33-g81eabcc kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 I set up and Openshift cluster (single master, two nodes) in three Openstack instances. I created a project and an application and scaled it up such that the application had two pods, one running on each node. I stopped one node with an immediate ungraceful shutdown. It took about 30s for the node to switch to NotReady and about 5 minutes for the old pod to be considered dead and the new pod to schedule onto the remaining node. However, I did not observe the pod on the terminated node getting stuck in Terminating state. When I brought the node back up, I scaled up to 3, then down to 2 and the pods rebalanced across the two nodes. Other than the 5 minute delay, which might arguably be too long, this worked as I expected. I also tried gracefully evacuating the node, which also worked as expected. # oadm manage-node node1 --schedulable=false # oadm manage-node node1 --evacuate (a new pod was immediately rescheduled to node2 with no pods stuck in Terminating) # oadm manage-node node1 --schedulable=true scale up to 3, then down to 2 and the pods rebalanced across the two nodes. In both situations, I was not able to reproduce the pod hung in Terminating state. Any additional information on how I might recreate this issue?
Harald, can we close this?
Closing as customer was unable to reproduce. Please reopen in the future if necessary.