Bug 1701099 - Pods are still forever shown Running after the corresponding hosting node is powered off
Summary: Pods are still forever shown Running after the corresponding hosting node is ...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.4.0
Assignee: Maciej Szulik
QA Contact: zhou ying
: 1715672 1738243 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2019-04-18 05:13 UTC by Xingxing Xia
Modified: 2019-11-06 14:13 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1694079 'urgent' 'CLOSED' 'Add tool that can restore expired certificates in case a cluster was suspended for longer period of time' 2019-12-04 15:27:26 UTC

Description Xingxing Xia 2019-04-18 05:13:22 UTC
Description of problem:
This is bug is filed separately for https://bugzilla.redhat.com/show_bug.cgi?id=1672894#c16 .
Pods are still shown Running after the corresponding hosting node is powered off for long time.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a NextGen cluster
2. Power off one master by ssh to the master and run `shutdown -h now`
3. Check the pods that were Running on the powered-off node.

Actual results:
3. The pods are still shown Running after long time.

Expected results:
3. The pods are not shown Running.

Additional info:

Comment 1 Seth Jennings 2019-04-18 20:11:19 UTC
What is "a long time"?  We expect them to show Running for 5m until the node controller evicts all the pods on the non-responsive node, moving them to NodeLost state.

Comment 2 Xingxing Xia 2019-04-19 01:38:54 UTC
(In reply to Seth Jennings from comment #1)
> What is "a long time"?

Forever. So far since the time of https://bugzilla.redhat.com/show_bug.cgi?id=1672894#c15 , 40 hours elapsed, the pods of the powered-off node (ip-172-31-167-112.us-east-2.compute.internal) on the env are still observed Running :
oc get no ip-172-31-167-112.us-east-2.compute.internal
NAME                                           STATUS     ROLES    AGE   VERSION
ip-172-31-167-112.us-east-2.compute.internal   NotReady   master   41h   v1.12.4+509916ce1

$ oc get po -o wide -n openshift-apiserver | grep ip-172-31-167-112.us-east-2.compute.internal
apiserver-bmgcw   1/1     Running   0          41h   ip-172-31-167-112.us-east-2.compute.internal   <none>           <none>

[xxia@fedora29 my]$ oc get po -o wide --all-namespaces | grep ip-172-31-167-112.us-east-2.compute.internal
kube-system                                             etcd-member-ip-172-31-167-112.us-east-2.compute.internal                2/2     Running       0          41h   ip-172-31-167-112.us-east-2.compute.internal   <none>           <none>
openshift-apiserver                                     apiserver-bmgcw                                                         1/1     Running       0          41h      ip-172-31-167-112.us-east-2.compute.internal   <none>           <none>

Comment 3 Seth Jennings 2019-04-22 14:02:14 UTC
Those pods are mirror pods (i.e. apiserver mirrors of a static pod manifests on the node).  I'm not sure if mirror pod status is every updated by anything other than the node.  Since, by definintion, this is no higher level controller that could start the pod on another node.

Comment 4 Seth Jennings 2019-04-22 16:22:13 UTC
This is not a blocker and is probably by design (i.e. not a bug).  Confirming...

Comment 6 Sunil Choudhary 2019-05-31 08:19:13 UTC
*** Bug 1715672 has been marked as a duplicate of this bug. ***

Comment 7 Vadim Zharov 2019-06-19 21:56:38 UTC
It is not related only to static/mirror pods, but also to all pods created by DaemonSets - don't think it is by design.
For example, if you shutdown master node and run 
oc get pods -n openshift-image-registry -o wide

it will show that pod are running on all nodes including master node.

At the same time Prometheus shows that this pods are NOT in "Ready" - if query for kube_pod_status_ready{namespace="openshift-etcd",condition="true"} - it will show you only two pods, not three as oc get pods -n openshift-etcd shows.

Comment 8 Seth Jennings 2019-08-05 17:55:15 UTC
I took down a master-1 in my test cluster, waited 10m and this is the result

$ oc get pod --all-namespaces -owide | grep master-1 | grep Running
openshift-apiserver                                     apiserver-plmkl                                                   1/1     Running       0          53m   master-1   <none>           <none>
openshift-cluster-node-tuning-operator                  tuned-9dsn5                                                       1/1     Running       0          55m   master-1   <none>           <none>
openshift-controller-manager                            controller-manager-bc7s9                                          1/1     Running       0          51m   master-1   <none>           <none>
openshift-dns                                           dns-default-8dpbm                                                 2/2     Running       0          58m    master-1   <none>           <none>
openshift-etcd                                          etcd-member-master-1                                              2/2     Running       0          59m   master-1   <none>           <none>
openshift-image-registry                                node-ca-tdxpk                                                     1/1     Running       0          54m   master-1   <none>           <none>
openshift-kube-apiserver                                kube-apiserver-master-1                                           3/3     Running       0          53m   master-1   <none>           <none>
openshift-kube-controller-manager                       kube-controller-manager-master-1                                  2/2     Running       0          53m   master-1   <none>           <none>
openshift-kube-scheduler                                openshift-kube-scheduler-master-1                                 1/1     Running       0          52m   master-1   <none>           <none>
openshift-machine-config-operator                       machine-config-daemon-7npbl                                       1/1     Running       0          58m   master-1   <none>           <none>
openshift-machine-config-operator                       machine-config-server-r4q87                                       1/1     Running       0          58m   master-1   <none>           <none>
openshift-monitoring                                    node-exporter-66st4                                               2/2     Running       0          54m   master-1   <none>           <none>
openshift-multus                                        multus-admission-controller-qjbmg                                 1/1     Running       0          60m   master-1   <none>           <none>
openshift-multus                                        multus-xftv5                                                      1/1     Running       0          60m   master-1   <none>           <none>
openshift-sdn                                           ovs-r45nv                                                         1/1     Running       0          60m   master-1   <none>           <none>
openshift-sdn                                           sdn-controller-nlxs2                                              1/1     Running       0          60m   master-1   <none>           <none>
openshift-sdn                                           sdn-kvfsf                                                         1/1     Running       1          59m   master-1   <none>           <none>

All of these are either static pods or DS pods

The DS does track that the pod on the offline node not available and the node is not included in the desired count.

$ oc get ds -n openshift-machine-config-operator machine-config-server
NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
machine-config-server   2         2         2       2            2           node-role.kubernetes.io/master=   62m

$ oc get pod -ojson -n openshift-machine-config-operator machine-config-server-r4q87 | jq '{phase: .status.phase, conditions: .status.conditions}'
  "phase": "Running",
  "conditions": [
      "lastProbeTime": null,
      "lastTransitionTime": "2019-08-05T16:41:29Z",
      "status": "True",
      "type": "Initialized"
      "lastProbeTime": null,
      "lastTransitionTime": "2019-08-05T16:41:31Z",
      "status": "False",
      "type": "Ready"
      "lastProbeTime": null,
      "lastTransitionTime": "2019-08-05T16:41:31Z",
      "status": "True",
      "type": "ContainersReady"
      "lastProbeTime": null,
      "lastTransitionTime": "2019-08-05T16:41:29Z",
      "status": "True",
      "type": "PodScheduled"

The pod phase is Running (what `oc get pod` shows) but the condition `Type: Ready` is `False`, which is not reflected in `oc get pod` output.

This is more a source of confusion than it is a bug.

The `oc get pod` printer has always listed a pod `STATUS` column which doesn't correspond directly to anything in the pod status.  It most closely mirrors status.phase (e.g. Pending, Running, Succeeded, Failed, Unknown), but is can take on other values that are not a pod phase a well (e.g. ImagePullBackOff).

Sending to CLI to see if they want to make this clearer in the pod printer.

Comment 9 Seth Jennings 2019-08-07 13:39:12 UTC
*** Bug 1738243 has been marked as a duplicate of this bug. ***

Comment 10 Maciej Szulik 2019-08-08 13:46:08 UTC
There's nothing in the pod definition nor status that would allow CLI to provide better information when invoking oc get command.
oc describe on a pod will show a warning iff probes where defined b/c these will fail. This might require a deeper upstream 
discussion so I'm moving this to 4.3.

Seth do you have any ideas if kubelet could provide such unavailable condition or something like that on pods?

Comment 11 Seth Jennings 2019-08-08 19:10:26 UTC
Only thing I could think was we show container readiness in the `x/y` form but not pod readiness.  Maybe we show status as `Running,NotReady` to reflect this?  That was my only potential idea.

Comment 12 Maciej Szulik 2019-08-19 14:42:17 UTC
> Only thing I could think was we show container readiness in the `x/y` form but not pod readiness.  Maybe we show status as `Running,NotReady` to reflect this?  That was my only potential idea.

I think that's a reasonable middle ground, I'll work on a PR.

Comment 13 Maciej Szulik 2019-11-06 14:13:47 UTC
This will have to happen upstream first, moving to 4.4

Note You need to log in before you can comment on or make changes to this bug.