1775252 – [Enhancement] Missing degraded condition when the static pod installer is unable to create pods due to networking errors

Bug 1775252 - [Enhancement] Missing degraded condition when the static pod installer is unable to create pods due to networking errors

Summary: [Enhancement] Missing degraded condition when the static pod installer is una...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Michal Fojtik
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1782795 (view as bug list)
Depends On:	1782791 1782793 1782795
Blocks:	1764629
TreeView+	depends on / blocked

Reported:	2019-11-21 15:39 UTC by Michal Fojtik
Modified:	2020-07-01 15:02 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1764629
Clones:	1782791 (view as bug list)
Environment:
Last Closed:	2020-07-01 15:02:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift library-go pull 643	0	None	closed	Bug 1775252: installerpodstate: fix reporting InstallerPodNetworkingDegraded	2021-02-11 12:35:45 UTC
Red Hat Product Errata	RHBA-2020:2628	0	None	None	None	2020-07-01 15:02:49 UTC

Comment 2 Xingxing Xia 2019-12-11 08:15:35 UTC

Tried steps of bug 1764629#c8 , after the steps, got:
[xxia 2019-12-11 16:04:58 my]$ oc get po -n openshift-kube-apiserver --show-labels
NAME                                                         READY   STATUS              RESTARTS   AGE     LABELS
...
installer-7-xxia-1-7f894-m-0.c.openshift-qe.internal         0/1     Completed           0          5h50m   app=installer
installer-7-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     Completed           0          5h52m   app=installer
installer-7-xxia-1-7f894-m-2.c.openshift-qe.internal         0/1     Completed           0          5h48m   app=installer
installer-8-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     ContainerCreating   0          3m10s   app=installer
kube-apiserver-xxia-1-7f894-m-0.c.openshift-qe.internal      3/3     Running             0          5h50m   apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-xxia-1-7f894-m-1.c.openshift-qe.internal      3/3     Running             0          5h52m   apiserver=true,app=openshift-kube-apiserver,revision=7
kube-apiserver-xxia-1-7f894-m-2.c.openshift-qe.internal      3/3     Running             0          5h48m   apiserver=true,app=openshift-kube-apiserver,revision=7
...
[xxia 2019-12-11 16:05:33 my]$ oc describe po -n openshift-kube-apiserver installer-8-xxia-1-7f894-m-1.c.openshift-qe.internal
...
  Warning  NetworkNotReady  2m44s (x25 over 3m32s)  kubelet, xxia-1-7f894-m-1.c.openshift-qe.internal  network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
[xxia 2019-12-11 16:06:00 my]$ oc get kubeapiserver cluster -o yaml
...
  conditions:
  - lastTransitionTime: "2019-12-11T02:10:36Z"
    status: "False"
    type: InstallerControllerDegraded
  - lastTransitionTime: "2019-12-11T02:07:45Z"
    message: 3 nodes are active; 3 nodes are at revision 7; 0 nodes have achieved
      new revision 8
    status: "True"
    type: Available
  - lastTransitionTime: "2019-12-11T08:02:24Z"
    message: 3 nodes are at revision 7; 0 nodes have achieved new revision 8
    status: "True"
    type: Progressing
...
  - lastTransitionTime: "2019-12-11T07:47:35Z"
    message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
    reason: MasterNodesReady
    status: "True"
    type: NodeControllerDegraded
...
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodNetworkingDegraded
...
  latestAvailableRevision: 8
  latestAvailableRevisionReason: ""
  nodeStatuses:
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-1.c.openshift-qe.internal
    targetRevision: 8
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-0.c.openshift-qe.internal
  - currentRevision: 7
    nodeName: xxia-1-7f894-m-2.c.openshift-qe.internal
  readyReplicas: 0

None of the InstallerPod* conditions change to True. Could you hint how to verify the bug?

Comment 3 Michal Fojtik 2019-12-11 10:03:19 UTC

You should see the InstallerPodNetworkingDegraded condition set *after* 5 minutes (which is the maximum time we allow CNI to fix itself until we set that condition).

Comment 4 Xingxing Xia 2019-12-11 14:07:14 UTC

Tried again, after 18m, still not see InstallerPodNetworkingDegraded set, only see InstallerPodContainerWaitingDegraded set, but the "message" is truncated so not know "because" what:
[xxia 2019-12-11 22:00:23 my]$ oc get po -n openshift-kube-apiserver --show-labels | grep installer
installer-10-xxia-1-7f894-m-0.c.openshift-qe.internal         0/1     Completed           0          40m     app=installer
installer-10-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     Completed           0          42m     app=installer
installer-10-xxia-1-7f894-m-2.c.openshift-qe.internal         0/1     Completed           0          38m     app=installer
installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal         0/1     ContainerCreating   0          18m     app=installer
...
[xxia 2019-12-11 22:01:30 my]$ oc get kubeapiserver cluster -o yaml
...
status:
  conditions:
...
  - lastTransitionTime: "2019-12-11T02:07:45Z"
    message: 3 nodes are active; 3 nodes are at revision 10; 0 nodes have achieved
      new revision 11
    status: "True"
    type: Available
  - lastTransitionTime: "2019-12-11T13:43:15Z"
    message: 3 nodes are at revision 10; 0 nodes have achieved new revision 11
    status: "True"
    type: Progressing
...
  - lastTransitionTime: "2019-12-11T13:36:47Z"
    message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
    reason: MasterNodesReady
    status: "True"
    type: NodeControllerDegraded
...
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2019-12-11T13:48:52Z"
    message: 'Pod "installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal" on node
      "xxia-1-7f894-m-1.c.openshift-qe.internal" container "installer" is waiting
      for 18m33.058366683s because '
    reason: ContainerCreating
    status: "True"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2019-12-11T02:04:19Z"
    status: "False"
    type: InstallerPodNetworkingDegraded
...

Comment 7 Xingxing Xia 2019-12-16 07:31:31 UTC

Tested 4.3.0-0.nightly-2019-12-13-180405 env, still got http://file.rdu.redhat.com/~xxia/bug-1775252-result-for-c6.txt :
For InstallerPodContainerWaitingDegraded, the `because ""` does not tell it is because what;
InstallerPodNetworkingDegraded is not True, given "network is not ready" is told.

Comment 8 Xingxing Xia 2019-12-16 08:31:05 UTC

*** Bug 1782795 has been marked as a duplicate of this bug. ***

Comment 11 Michal Fojtik 2020-06-18 08:42:16 UTC

Xingxing Xia can you please retest? If this won't work now, can you please capture the installer pod YAML (to check whether the status in the pod carry the reason why it is stucked).

Comment 14 Xingxing Xia 2020-06-23 11:47:10 UTC

Verified in 4.3.0-0.nightly-2020-06-23-075250 using above comment steps. After 5 mins, got the message and reason in InstallerPod* conditions:
$ oc get kubeapiserver cluster -o yaml
...
  - lastTransitionTime: "2020-06-23T10:45:30Z"
    status: "False"
    type: InstallerPodPendingDegraded
  - lastTransitionTime: "2020-06-23T11:42:47Z"
    message: Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on node
      "ip-10-0-131-75.ap-northeast-2.compute.internal" container "installer" is waiting
      for 6m55.3685872s because ""
    reason: ContainerCreating
    status: "True"
    type: InstallerPodContainerWaitingDegraded
  - lastTransitionTime: "2020-06-23T11:42:47Z"
    message: 'Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on
      node "ip-10-0-131-75.ap-northeast-2.compute.internal" observed degraded networking:
      Failed create pod sandbox: rpc error: code = Unknown desc = failed to create
      pod network sandbox k8s_installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal_openshift-kube-apiserver_0a7eecda-3697-44d3-bfe8-ecccfa327d3f_0(8bf0fdb3acd0d390df7361dc70bd97a544f3e468f58c7129146a1d480e818309):
      Multus: [openshift-kube-apiserver/installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal]:
      PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for
      the condition'
    reason: FailedCreatePodSandBox
    status: "True"
    type: InstallerPodNetworkingDegraded
...

Comment 16 errata-xmlrpc 2020-07-01 15:02:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2628

Note You need to log in before you can comment on or make changes to this bug.