Bug 1987046

Summary: periodic ci-4.8-upgrade-from-stable-4.7-e2e-*-ovn-upgrade are permafailing on service/ingress disruption
Product: OpenShift Container Platform Reporter: Vadim Rutkovsky <vrutkovs>
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aconstan, anusaxen, cholman, philipp.dallig, surya, vpickard, wking
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1970985 Environment:
Last Closed: 2021-09-07 04:14:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1970985    
Bug Blocks: 1929396    

Description Vadim Rutkovsky 2021-07-28 19:32:54 UTC
+++ This bug was initially created as a clone of Bug #1970985 +++

Description of problem:
https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade:

last 5 failures are failing with:
  [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
fail [github.com/openshift/origin/test/e2e/upgrade/service/service.go:161]: Jun 11 15:04:57.712: Service was unreachable during disruption for at least 7m24s of 1h23m56s (9%):

and

disruption_tests: [sig-network-edge] Cluster frontend ingress remain available
Jun 11 15:04:57.713: Frontends were unreachable during disruption for at least 25m23s of 1h25m54s (30%):

Jun 11 14:22:20.615 E ns/openshift-console route/console Route stopped responding to GET requests over new connections
Jun 11 14:22:20.615 - 405s  E ns/openshift-console route/console Route is not responding to GET requests over new connections

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1403330789755588608

These failures are not present in OpenshiftSDN-based cluster updates

--- Additional comment from Casey Callendrello on 2021-06-21 14:21:33 UTC ---

Current theory: rolling out OVN node changes is somewhat disruptive. One way to make this faster would be to pre-pull the images.

Surya is currently working on a pre-pull solution for CNO.

--- Additional comment from Surya Seetharaman on 2021-07-05 09:04:12 UTC ---

1) https://github.com/openshift/cluster-network-operator/pull/1141
2) https://github.com/ovn-org/ovn-kubernetes/pull/2183

Should make the situation better if not be ideal fixes. I'll push on them and get them in.

--- Additional comment from OpenShift Automated Release Tooling on 2021-07-23 04:26:13 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from Vadim Rutkovsky on 2021-07-28 07:59:13 UTC ---

Two passes on 4.8 -> 4.9 jobs:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade/1420237641722368000
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-ovn-upgrade/1420224131944681472

Comment 7 errata-xmlrpc 2021-09-07 04:14:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.10 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3299