Bug 1779863

Summary: GCP OVN 4.3 install jobs consistently timing out waiting for multus
Product: OpenShift Container Platform Reporter: Jonathan Lebon <jlebon>
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aconstan, anusaxen, bbennett, dcbw, dosmith, hongkliu, igortiunov, jchaloup, pmuller
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1801634 (view as bug list) Environment:
Last Closed: 2020-02-25 06:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1801634    
Bug Blocks:    

Description Jonathan Lebon 2019-12-04 21:17:04 UTC
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/17

Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-04-191539
level=warning msg="Found override for release image. Please be warned, this is not advised"
level=info msg="Consuming Install Config from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-4txbhlvq-f342c.origin-ci-int-gce.dev.openshift.com:6443..."
level=info msg="API v1.16.2 up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/multus-admission-controller\" is not yet scheduled on any nodes"
level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20191204200453.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"

I'm not sure exactly why the daemonset isn't being scheduled. Masters look like they came up at least.

Comment 1 Douglas Smith 2019-12-04 22:05:40 UTC
This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

Comment 2 Douglas Smith 2019-12-04 22:05:57 UTC
This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

Comment 3 Anurag saxena 2019-12-05 14:15:01 UTC
GCP-OVN installation is blocked in 4.3 though due to various reasons https://bugzilla.redhat.com/show_bug.cgi?id=1748162

Comment 4 Jan Chaloupka 2019-12-10 17:00:31 UTC
*** Bug 1781695 has been marked as a duplicate of this bug. ***

Comment 5 Dan Williams 2020-01-29 13:00:46 UTC
GCP + OVN is fine now that MTU issues have been sorted out. Do we have recent CI failures here that we can debug?

Comment 6 Weibin Liang 2020-01-29 15:06:59 UTC
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1748162#c64

OVN can be installed in GCP cluster in 4.4.0-0.nightly-2020-01-16-113546

Comment 8 Weibin Liang 2020-01-29 16:16:28 UTC
QE will retest it in latest v4.3

Comment 12 zhaozhanqi 2020-02-14 02:19:46 UTC
verified this bug on 4.3.0-0.nightly-2020-02-13-214539
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/440

Comment 14 errata-xmlrpc 2020-02-25 06:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0528