Bug 1775252 - [Enhancement] Missing degraded condition when the static pod installer is unable to create pods due to networking errors
Summary: [Enhancement] Missing degraded condition when the static pod installer is una...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.3.z
Assignee: Michal Fojtik
QA Contact: Xingxing Xia
URL:
Whiteboard:
: 1782795 (view as bug list)
Depends On: 1782791 1782793 1782795
Blocks: 1764629
TreeView+ depends on / blocked
 
Reported: 2019-11-21 15:39 UTC by Michal Fojtik
Modified: 2020-07-01 15:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1764629
: 1782791 (view as bug list)
Environment:
Last Closed: 2020-07-01 15:02:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift library-go pull 643 0 None closed Bug 1775252: installerpodstate: fix reporting InstallerPodNetworkingDegraded 2021-02-11 12:35:45 UTC
Red Hat Product Errata RHBA-2020:2628 0 None None None 2020-07-01 15:02:49 UTC

Comment 2 Xingxing Xia 2019-12-11 08:15:35 UTC
Tried steps of bug 1764629#c8 , after the steps, got:
[xxia 2019-12-11 16:04:58 my]$ oc get po -n openshift-kube-apiserver --show-labels
NAME                                                         READY   STATUS              RESTARTS   AGE     LABELS
...
installer-7-xxia-1-7f894-m-0.c.openshift-qe.internal         0/1     Completed           0          5h50m   app=installer
installer-7-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     Completed           0          5h52m   app=installer
installer-7-xxia-1-7f894-m-2.c.openshift-qe.internal         0/1     Completed           0          5h48m   app=installer
installer-8-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     ContainerCreating   0          3m10s   app=installer
kube-apiserver-xxia-1-7f894-m-0.c.openshift-qe.internal      3/3     Running             0          5h50m   apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-xxia-1-7f894-m-1.c.openshift-qe.internal      3/3     Running             0          5h52m   apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-xxia-1-7f894-m-2.c.openshift-qe.internal      3/3     Running             0          5h48m   apiserver=true,app=openshift-kube-apiserver,revision=7
...
[xxia 2019-12-11 16:05:33 my]$ oc describe po -n openshift-kube-apiserver installer-8-xxia-1-7f894-m-1.c.openshift-qe.internal
...
  Warning  NetworkNotReady  2m44s (x25 over 3m32s)  kubelet, xxia-1-7f894-m-1.c.openshift-qe.internal  network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
[xxia 2019-12-11 16:06:00 my]$ oc get kubeapiserver cluster -o yaml
...
  conditions:
  - lastTransitionTime: "2019-12-11T02:10:36Z"
    status: "False"
    type: InstallerControllerDegraded
  - lastTransitionTime: "2019-12-11T02:07:45Z"
    message: 3 nodes are active; 3 nodes are at revision 7; 0 nodes have achieved
      new revision 8
    status: "True"
    type: Available
  - lastTransitionTime: "2019-12-11T08:02:24Z"
    message: 3 nodes are at revision 7; 0 nodes have achieved new revision 8
    status: "True"
    type: Progressing
...
  - lastTransitionTime: "2019-12-11T07:47:35Z"
    message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
    reason: MasterNodesReady
    status: "True"
    type: NodeControllerDegraded
...
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodNetworkingDegraded
...
  latestAvailableRevision: 8
  latestAvailableRevisionReason: ""
  nodeStatuses:
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-1.c.openshift-qe.internal
    targetRevision: 8
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-0.c.openshift-qe.internal
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-2.c.openshift-qe.internal
  readyReplicas: 0

None of the InstallerPod* conditions change to True. Could you hint how to verify the bug?

Comment 3 Michal Fojtik 2019-12-11 10:03:19 UTC
You should see the InstallerPodNetworkingDegraded condition set *after* 5 minutes (which is the maximum time we allow CNI to fix itself until we set that condition).

Comment 4 Xingxing Xia 2019-12-11 14:07:14 UTC
Tried again, after 18m, still not see InstallerPodNetworkingDegraded set, only see InstallerPodContainerWaitingDegraded set, but the "message" is truncated so not know "because" what:
[xxia 2019-12-11 22:00:23 my]$ oc get po -n openshift-kube-apiserver --show-labels | grep installer
installer-10-xxia-1-7f894-m-0.c.openshift-qe.internal         0/1     Completed           0          40m     app=installer
installer-10-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     Completed           0          42m     app=installer
installer-10-xxia-1-7f894-m-2.c.openshift-qe.internal         0/1     Completed           0          38m     app=installer
installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     ContainerCreating   0          18m     app=installer
...
[xxia 2019-12-11 22:01:30 my]$ oc get kubeapiserver cluster -o yaml
...
status:
  conditions:
...
  - lastTransitionTime: "2019-12-11T02:07:45Z"
    message: 3 nodes are active; 3 nodes are at revision 10; 0 nodes have achieved
      new revision 11
    status: "True"
    type: Available
  - lastTransitionTime: "2019-12-11T13:43:15Z"
    message: 3 nodes are at revision 10; 0 nodes have achieved new revision 11
    status: "True"
    type: Progressing
...
  - lastTransitionTime: "2019-12-11T13:36:47Z"
    message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
    reason: MasterNodesReady
    status: "True"
    type: NodeControllerDegraded
...
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2019-12-11T13:48:52Z"
    message: 'Pod "installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal" on node
      "xxia-1-7f894-m-1.c.openshift-qe.internal" container "installer" is waiting
      for 18m33.058366683s because '
    reason: ContainerCreating
    status: "True"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodNetworkingDegraded
...

Comment 7 Xingxing Xia 2019-12-16 07:31:31 UTC
Tested 4.3.0-0.nightly-2019-12-13-180405 env, still got http://file.rdu.redhat.com/~xxia/bug-1775252-result-for-c6.txt :
For InstallerPodContainerWaitingDegraded, the `because ""` does not tell it is because what;
InstallerPodNetworkingDegraded is not True, given "network is not ready" is told.

Comment 8 Xingxing Xia 2019-12-16 08:31:05 UTC
*** Bug 1782795 has been marked as a duplicate of this bug. ***

Comment 11 Michal Fojtik 2020-06-18 08:42:16 UTC
Xingxing Xia can you please retest? If this won't work now, can you please capture the installer pod YAML (to check whether the status in the pod carry the reason why it is stucked).

Comment 14 Xingxing Xia 2020-06-23 11:47:10 UTC
Verified in 4.3.0-0.nightly-2020-06-23-075250 using above comment steps. After 5 mins, got the message and reason in InstallerPod* conditions:
$ oc get kubeapiserver cluster -o yaml
...
  - lastTransitionTime: "2020-06-23T10:45:30Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2020-06-23T11:42:47Z"
    message: Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on node
      "ip-10-0-131-75.ap-northeast-2.compute.internal" container "installer" is waiting
      for 6m55.3685872s because ""
    reason: ContainerCreating
    status: "True"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2020-06-23T11:42:47Z"
    message: 'Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on
      node "ip-10-0-131-75.ap-northeast-2.compute.internal" observed degraded networking:
      Failed create pod sandbox: rpc error: code = Unknown desc = failed to create
      pod network sandbox k8s_installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal_openshift-kube-apiserver_0a7eecda-3697-44d3-bfe8-ecccfa327d3f_0(8bf0fdb3acd0d390df7361dc70bd97a544f3e468f58c7129146a1d480e818309):
      Multus: [openshift-kube-apiserver/installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal]:
      PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for
      the condition'
    reason: FailedCreatePodSandBox
    status: "True"
    type: InstallerPodNetworkingDegraded
...

Comment 16 errata-xmlrpc 2020-07-01 15:02:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2628


Note You need to log in before you can comment on or make changes to this bug.