Bug 1801634

Summary: [Placeholder] GCP OVN 4.4 install jobs consistently timing out waiting for multus
Product: OpenShift Container Platform Reporter: Alexander Constantinescu <aconstan>
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aconstan, anusaxen, bbennett, dcbw, dosmith, hongkliu, jchaloup, jlebon, pmuller, zzhao
Version: 4.4   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1779863 Environment:
Last Closed: 2020-05-04 11:35:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1779863    

Description Alexander Constantinescu 2020-02-11 12:35:58 UTC
+++ This bug was initially created as a clone of Bug #1779863 +++

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/17

Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-04-191539
level=warning msg="Found override for release image. Please be warned, this is not advised"
level=info msg="Consuming Install Config from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-4txbhlvq-f342c.origin-ci-int-gce.dev.openshift.com:6443..."
level=info msg="API v1.16.2 up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/multus-admission-controller\" is not yet scheduled on any nodes"
level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20191204200453.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"

I'm not sure exactly why the daemonset isn't being scheduled. Masters look like they came up at least.

--- Additional comment from Douglas Smith on 2019-12-04 22:05:40 UTC ---

This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

--- Additional comment from Douglas Smith on 2019-12-04 22:05:57 UTC ---

This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

--- Additional comment from Anurag saxena on 2019-12-05 14:15:01 UTC ---

GCP-OVN installation is blocked in 4.3 though due to various reasons https://bugzilla.redhat.com/show_bug.cgi?id=1748162

--- Additional comment from Jan Chaloupka on 2019-12-10 17:00:31 UTC ---



--- Additional comment from Dan Williams on 2020-01-29 13:00:46 UTC ---

GCP + OVN is fine now that MTU issues have been sorted out. Do we have recent CI failures here that we can debug?

--- Additional comment from Weibin Liang on 2020-01-29 15:06:59 UTC ---

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1748162#c64

OVN can be installed in GCP cluster in 4.4.0-0.nightly-2020-01-16-113546

--- Additional comment from Petr Muller on 2020-01-29 15:47:51 UTC ---

Here's a 4.3 CI failure from today:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/358

--- Additional comment from Weibin Liang on 2020-01-29 16:16:28 UTC ---

QE will retest it in latest v4.3

--- Additional comment from Hongkai Liu on 2020-01-31 15:47:03 UTC ---

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/366

--- Additional comment from Hongkai Liu on 2020-01-31 21:17:44 UTC ---

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/368

Comment 1 Alexander Constantinescu 2020-02-11 12:38:45 UTC
This is just a placeholder bug for the 4.3 one.

Comment 3 errata-xmlrpc 2020-05-04 11:35:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581