Bug 1987046 - periodic ci-4.8-upgrade-from-stable-4.7-e2e-*-ovn-upgrade are permafailing on service/ingress disruption
Summary: periodic ci-4.8-upgrade-from-stable-4.7-e2e-*-ovn-upgrade are permafailing on...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.z
Assignee: Surya Seetharaman
QA Contact: Anurag saxena
Depends On: 1970985
Blocks: 1929396
TreeView+ depends on / blocked
Reported: 2021-07-28 19:32 UTC by Vadim Rutkovsky
Modified: 2021-09-07 04:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1970985
Last Closed: 2021-09-07 04:14:05 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1167 0 None open Bug 1987046: Add pre-puller ds to reduce upgrade downtime 2021-07-30 09:02:28 UTC
Red Hat Product Errata RHBA-2021:3299 0 None None None 2021-09-07 04:14:18 UTC

Description Vadim Rutkovsky 2021-07-28 19:32:54 UTC
+++ This bug was initially created as a clone of Bug #1970985 +++

Description of problem:

last 5 failures are failing with:
  [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
fail [github.com/openshift/origin/test/e2e/upgrade/service/service.go:161]: Jun 11 15:04:57.712: Service was unreachable during disruption for at least 7m24s of 1h23m56s (9%):


disruption_tests: [sig-network-edge] Cluster frontend ingress remain available
Jun 11 15:04:57.713: Frontends were unreachable during disruption for at least 25m23s of 1h25m54s (30%):

Jun 11 14:22:20.615 E ns/openshift-console route/console Route stopped responding to GET requests over new connections
Jun 11 14:22:20.615 - 405s  E ns/openshift-console route/console Route is not responding to GET requests over new connections


These failures are not present in OpenshiftSDN-based cluster updates

--- Additional comment from Casey Callendrello on 2021-06-21 14:21:33 UTC ---

Current theory: rolling out OVN node changes is somewhat disruptive. One way to make this faster would be to pre-pull the images.

Surya is currently working on a pre-pull solution for CNO.

--- Additional comment from Surya Seetharaman on 2021-07-05 09:04:12 UTC ---

1) https://github.com/openshift/cluster-network-operator/pull/1141
2) https://github.com/ovn-org/ovn-kubernetes/pull/2183

Should make the situation better if not be ideal fixes. I'll push on them and get them in.

--- Additional comment from OpenShift Automated Release Tooling on 2021-07-23 04:26:13 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from Vadim Rutkovsky on 2021-07-28 07:59:13 UTC ---

Two passes on 4.8 -> 4.9 jobs:

Comment 7 errata-xmlrpc 2021-09-07 04:14:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.10 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.