Bug 1801634 - [Placeholder] GCP OVN 4.4 install jobs consistently timing out waiting for multus
Summary: [Placeholder] GCP OVN 4.4 install jobs consistently timing out waiting for mu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Alexander Constantinescu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1779863
TreeView+ depends on / blocked
 
Reported: 2020-02-11 12:35 UTC by Alexander Constantinescu
Modified: 2020-05-04 11:36 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1779863
Environment:
Last Closed: 2020-05-04 11:35:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:36:01 UTC

Description Alexander Constantinescu 2020-02-11 12:35:58 UTC
+++ This bug was initially created as a clone of Bug #1779863 +++

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/17

Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-04-191539
level=warning msg="Found override for release image. Please be warned, this is not advised"
level=info msg="Consuming Install Config from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-4txbhlvq-f342c.origin-ci-int-gce.dev.openshift.com:6443..."
level=info msg="API v1.16.2 up"
level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/multus-admission-controller\" is not yet scheduled on any nodes"
level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20191204200453.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"

I'm not sure exactly why the daemonset isn't being scheduled. Masters look like they came up at least.

--- Additional comment from Douglas Smith on 2019-12-04 22:05:40 UTC ---

This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

--- Additional comment from Douglas Smith on 2019-12-04 22:05:57 UTC ---

This looks very similar to something reported yesterday to the openshift-sdn team.

In the case of the similar problem, it was reported that:

```
DaemonSet "openshift-multus/multus" rollout is not making progress
```

By the cluster-network-operator in these logs @ https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11961#1:build-log.txt%3A6927

It should be noted that the multus-admission-controller is not a dependency of any other function in the cluster (that is, multus doesn't need to wait for it, and neither does the openshift-sdn)

That being said, I believe that this is a symptom and not a cause of the core issue.

Did the master nodes become ready? As that was also a symptom of the issue reported yesterday that:

```
NodeControllerDegraded: The master node(s) \"ip-10-0-129-3.ec2.internal\" not ready
```

--- Additional comment from Anurag saxena on 2019-12-05 14:15:01 UTC ---

GCP-OVN installation is blocked in 4.3 though due to various reasons https://bugzilla.redhat.com/show_bug.cgi?id=1748162

--- Additional comment from Jan Chaloupka on 2019-12-10 17:00:31 UTC ---



--- Additional comment from Dan Williams on 2020-01-29 13:00:46 UTC ---

GCP + OVN is fine now that MTU issues have been sorted out. Do we have recent CI failures here that we can debug?

--- Additional comment from Weibin Liang on 2020-01-29 15:06:59 UTC ---

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1748162#c64

OVN can be installed in GCP cluster in 4.4.0-0.nightly-2020-01-16-113546

--- Additional comment from Petr Muller on 2020-01-29 15:47:51 UTC ---

Here's a 4.3 CI failure from today:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/358

--- Additional comment from Weibin Liang on 2020-01-29 16:16:28 UTC ---

QE will retest it in latest v4.3

--- Additional comment from Hongkai Liu on 2020-01-31 15:47:03 UTC ---

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/366

--- Additional comment from Hongkai Liu on 2020-01-31 21:17:44 UTC ---

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.3/368

Comment 1 Alexander Constantinescu 2020-02-11 12:38:45 UTC
This is just a placeholder bug for the 4.3 one.

Comment 3 errata-xmlrpc 2020-05-04 11:35:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.