Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1775252

Summary: [Enhancement] Missing degraded condition when the static pod installer is unable to create pods due to networking errors
Product: OpenShift Container Platform Reporter: Michal Fojtik <mfojtik>
Component: kube-apiserverAssignee: Michal Fojtik <mfojtik>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: low Docs Contact:
Priority: low    
Version: 4.3.0CC: aos-bugs, eparis, mfojtik, scuppett, xxia
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1764629
: 1782791 (view as bug list) Environment:
Last Closed: 2020-07-01 15:02:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1782791, 1782793, 1782795    
Bug Blocks: 1764629    

Comment 2 Xingxing Xia 2019-12-11 08:15:35 UTC
Tried steps of bug 1764629#c8 , after the steps, got:
[xxia 2019-12-11 16:04:58 my]$ oc get po -n openshift-kube-apiserver --show-labels
NAME                                                         READY   STATUS              RESTARTS   AGE     LABELS
...
installer-7-xxia-1-7f894-m-0.c.openshift-qe.internal         0/1     Completed           0          5h50m   app=installer
installer-7-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     Completed           0          5h52m   app=installer
installer-7-xxia-1-7f894-m-2.c.openshift-qe.internal         0/1     Completed           0          5h48m   app=installer
installer-8-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     ContainerCreating   0          3m10s   app=installer
kube-apiserver-xxia-1-7f894-m-0.c.openshift-qe.internal      3/3     Running             0          5h50m   apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-xxia-1-7f894-m-1.c.openshift-qe.internal      3/3     Running             0          5h52m   apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-xxia-1-7f894-m-2.c.openshift-qe.internal      3/3     Running             0          5h48m   apiserver=true,app=openshift-kube-apiserver,revision=7
...
[xxia 2019-12-11 16:05:33 my]$ oc describe po -n openshift-kube-apiserver installer-8-xxia-1-7f894-m-1.c.openshift-qe.internal
...
  Warning  NetworkNotReady  2m44s (x25 over 3m32s)  kubelet, xxia-1-7f894-m-1.c.openshift-qe.internal  network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
[xxia 2019-12-11 16:06:00 my]$ oc get kubeapiserver cluster -o yaml
...
  conditions:
  - lastTransitionTime: "2019-12-11T02:10:36Z"
    status: "False"
    type: InstallerControllerDegraded
  - lastTransitionTime: "2019-12-11T02:07:45Z"
    message: 3 nodes are active; 3 nodes are at revision 7; 0 nodes have achieved
      new revision 8
    status: "True"
    type: Available
  - lastTransitionTime: "2019-12-11T08:02:24Z"
    message: 3 nodes are at revision 7; 0 nodes have achieved new revision 8
    status: "True"
    type: Progressing
...
  - lastTransitionTime: "2019-12-11T07:47:35Z"
    message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
    reason: MasterNodesReady
    status: "True"
    type: NodeControllerDegraded
...
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodNetworkingDegraded
...
  latestAvailableRevision: 8
  latestAvailableRevisionReason: ""
  nodeStatuses:
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-1.c.openshift-qe.internal
    targetRevision: 8
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-0.c.openshift-qe.internal
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-2.c.openshift-qe.internal
  readyReplicas: 0

None of the InstallerPod* conditions change to True. Could you hint how to verify the bug?

Comment 3 Michal Fojtik 2019-12-11 10:03:19 UTC
You should see the InstallerPodNetworkingDegraded condition set *after* 5 minutes (which is the maximum time we allow CNI to fix itself until we set that condition).

Comment 4 Xingxing Xia 2019-12-11 14:07:14 UTC
Tried again, after 18m, still not see InstallerPodNetworkingDegraded set, only see InstallerPodContainerWaitingDegraded set, but the "message" is truncated so not know "because" what:
[xxia 2019-12-11 22:00:23 my]$ oc get po -n openshift-kube-apiserver --show-labels | grep installer
installer-10-xxia-1-7f894-m-0.c.openshift-qe.internal         0/1     Completed           0          40m     app=installer
installer-10-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     Completed           0          42m     app=installer
installer-10-xxia-1-7f894-m-2.c.openshift-qe.internal         0/1     Completed           0          38m     app=installer
installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     ContainerCreating   0          18m     app=installer
...
[xxia 2019-12-11 22:01:30 my]$ oc get kubeapiserver cluster -o yaml
...
status:
  conditions:
...
  - lastTransitionTime: "2019-12-11T02:07:45Z"
    message: 3 nodes are active; 3 nodes are at revision 10; 0 nodes have achieved
      new revision 11
    status: "True"
    type: Available
  - lastTransitionTime: "2019-12-11T13:43:15Z"
    message: 3 nodes are at revision 10; 0 nodes have achieved new revision 11
    status: "True"
    type: Progressing
...
  - lastTransitionTime: "2019-12-11T13:36:47Z"
    message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
    reason: MasterNodesReady
    status: "True"
    type: NodeControllerDegraded
...
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2019-12-11T13:48:52Z"
    message: 'Pod "installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal" on node
      "xxia-1-7f894-m-1.c.openshift-qe.internal" container "installer" is waiting
      for 18m33.058366683s because '
    reason: ContainerCreating
    status: "True"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodNetworkingDegraded
...

Comment 7 Xingxing Xia 2019-12-16 07:31:31 UTC
Tested 4.3.0-0.nightly-2019-12-13-180405 env, still got http://file.rdu.redhat.com/~xxia/bug-1775252-result-for-c6.txt :
For InstallerPodContainerWaitingDegraded, the `because ""` does not tell it is because what;
InstallerPodNetworkingDegraded is not True, given "network is not ready" is told.

Comment 8 Xingxing Xia 2019-12-16 08:31:05 UTC
*** Bug 1782795 has been marked as a duplicate of this bug. ***

Comment 11 Michal Fojtik 2020-06-18 08:42:16 UTC
Xingxing Xia can you please retest? If this won't work now, can you please capture the installer pod YAML (to check whether the status in the pod carry the reason why it is stucked).

Comment 14 Xingxing Xia 2020-06-23 11:47:10 UTC
Verified in 4.3.0-0.nightly-2020-06-23-075250 using above comment steps. After 5 mins, got the message and reason in InstallerPod* conditions:
$ oc get kubeapiserver cluster -o yaml
...
  - lastTransitionTime: "2020-06-23T10:45:30Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2020-06-23T11:42:47Z"
    message: Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on node
      "ip-10-0-131-75.ap-northeast-2.compute.internal" container "installer" is waiting
      for 6m55.3685872s because ""
    reason: ContainerCreating
    status: "True"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2020-06-23T11:42:47Z"
    message: 'Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on
      node "ip-10-0-131-75.ap-northeast-2.compute.internal" observed degraded networking:
      Failed create pod sandbox: rpc error: code = Unknown desc = failed to create
      pod network sandbox k8s_installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal_openshift-kube-apiserver_0a7eecda-3697-44d3-bfe8-ecccfa327d3f_0(8bf0fdb3acd0d390df7361dc70bd97a544f3e468f58c7129146a1d480e818309):
      Multus: [openshift-kube-apiserver/installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal]:
      PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for
      the condition'
    reason: FailedCreatePodSandBox
    status: "True"
    type: InstallerPodNetworkingDegraded
...

Comment 16 errata-xmlrpc 2020-07-01 15:02:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2628