Bug 1779863 - GCP OVN 4.3 install jobs consistently timing out waiting for multus
Summary: GCP OVN 4.3 install jobs consistently timing out waiting for multus
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.z
Assignee: Alexander Constantinescu
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1781695 (view as bug list)
Depends On: 1801634
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-04 21:17 UTC by Jonathan Lebon
Modified: 2020-02-25 06:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1801634 (view as bug list)
Environment:
Last Closed: 2020-02-25 06:17:59 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 470 0 None closed Bug 1779863: backport ovn-controller rbac fix to 4.3 2021-01-08 18:38:00 UTC
Red Hat Product Errata RHBA-2020:0528 0 None None None 2020-02-25 06:18:12 UTC

Description Jonathan Lebon 2019-12-04 21:17:04 UTC
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/17

Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-04-191539
level=warning msg="Found override for release image. Please be warned, this is not advised"
level=info msg="Consuming Install Config from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-4txbhlvq-f342c.origin-ci-int-gce.dev.openshift.com:6443..."
level=info msg="API v1.16.2 up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/multus-admission-controller\" is not yet scheduled on any nodes"
level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20191204200453.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"

I'm not sure exactly why the daemonset isn't being scheduled. Masters look like they came up at least.

Comment 1 Douglas Smith 2019-12-04 22:05:40 UTC
This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

Comment 2 Douglas Smith 2019-12-04 22:05:57 UTC
This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

Comment 3 Anurag saxena 2019-12-05 14:15:01 UTC
GCP-OVN installation is blocked in 4.3 though due to various reasons https://bugzilla.redhat.com/show_bug.cgi?id=1748162

Comment 4 Jan Chaloupka 2019-12-10 17:00:31 UTC
*** Bug 1781695 has been marked as a duplicate of this bug. ***

Comment 5 Dan Williams 2020-01-29 13:00:46 UTC
GCP + OVN is fine now that MTU issues have been sorted out. Do we have recent CI failures here that we can debug?

Comment 6 Weibin Liang 2020-01-29 15:06:59 UTC
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1748162#c64

OVN can be installed in GCP cluster in 4.4.0-0.nightly-2020-01-16-113546

Comment 8 Weibin Liang 2020-01-29 16:16:28 UTC
QE will retest it in latest v4.3

Comment 12 zhaozhanqi 2020-02-14 02:19:46 UTC
verified this bug on 4.3.0-0.nightly-2020-02-13-214539
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/440

Comment 14 errata-xmlrpc 2020-02-25 06:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0528


Note You need to log in before you can comment on or make changes to this bug.