1779863 – GCP OVN 4.3 install jobs consistently timing out waiting for multus

Bug 1779863 - GCP OVN 4.3 install jobs consistently timing out waiting for multus

Summary: GCP OVN 4.3 install jobs consistently timing out waiting for multus

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Alexander Constantinescu
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1781695 (view as bug list)
Depends On:	1801634
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-04 21:17 UTC by Jonathan Lebon
Modified:	2020-02-25 06:18 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1801634 (view as bug list)
Environment:
Last Closed:	2020-02-25 06:17:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 470	0	None	closed	Bug 1779863: backport ovn-controller rbac fix to 4.3	2021-01-08 18:38:00 UTC
Red Hat Product Errata	RHBA-2020:0528	0	None	None	None	2020-02-25 06:18:12 UTC

Description Jonathan Lebon 2019-12-04 21:17:04 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/17

Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-04-191539
level=warning msg="Found override for release image. Please be warned, this is not advised"
level=info msg="Consuming Install Config from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-4txbhlvq-f342c.origin-ci-int-gce.dev.openshift.com:6443..."
level=info msg="API v1.16.2 up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/multus-admission-controller\" is not yet scheduled on any nodes"
level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20191204200453.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"

I'm not sure exactly why the daemonset isn't being scheduled. Masters look like they came up at least.

Comment 1 Douglas Smith 2019-12-04 22:05:40 UTC

This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

Comment 2 Douglas Smith 2019-12-04 22:05:57 UTC

This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

Comment 3 Anurag saxena 2019-12-05 14:15:01 UTC

GCP-OVN installation is blocked in 4.3 though due to various reasons https://bugzilla.redhat.com/show_bug.cgi?id=1748162

Comment 4 Jan Chaloupka 2019-12-10 17:00:31 UTC

*** Bug 1781695 has been marked as a duplicate of this bug. ***

Comment 5 Dan Williams 2020-01-29 13:00:46 UTC

GCP + OVN is fine now that MTU issues have been sorted out. Do we have recent CI failures here that we can debug?

Comment 6 Weibin Liang 2020-01-29 15:06:59 UTC

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1748162#c64

OVN can be installed in GCP cluster in 4.4.0-0.nightly-2020-01-16-113546

Comment 7 Petr Muller 2020-01-29 15:47:51 UTC

Here's a 4.3 CI failure from today:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/358

Comment 8 Weibin Liang 2020-01-29 16:16:28 UTC

QE will retest it in latest v4.3

Comment 9 Hongkai Liu 2020-01-31 15:47:03 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/366

Comment 10 Hongkai Liu 2020-01-31 21:17:44 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/368

Comment 12 zhaozhanqi 2020-02-14 02:19:46 UTC

verified this bug on 4.3.0-0.nightly-2020-02-13-214539
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/440

Comment 14 errata-xmlrpc 2020-02-25 06:17:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0528

Note You need to log in before you can comment on or make changes to this bug.