Bug 1943566 - upgrade jobs are broken on ovn-kubernetes clusters
Summary: upgrade jobs are broken on ovn-kubernetes clusters
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: jamo luhrsen
QA Contact: Anurag saxena
URL:
Whiteboard: TechnicalReleaseBlocker
Depends On: 1817075 1927264 1929396 1942164 1943334 1943363 1944180 1944195 1944264 1959238 2040530 2084366
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-26 13:22 UTC by ravig
Modified: 2022-05-12 00:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-04 18:20:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description ravig 2021-03-26 13:22:18 UTC
Description of problem:


The upgrades job has been broken for a long time:

https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade


There could be multiple entities in the play here. The goal is to see if there are improvements that can be done on the networking side of things.

Version-Release number of selected component (if applicable):


How reproducible:

Always
Steps to Reproduce:
1.
2.
3.

Actual results:
Upgrades going smoothly

Expected results:


Additional info:

Comment 3 Dan Williams 2021-04-30 13:49:38 UTC
IMO the synthetic tests that fail on "waiting for flows" are overly aggressive and this should not be a blocker for 4.8 release as it is not a regression.

Instead, we should make the synthetic "waiting for flows" tests flakes for 4.8.

Comment 4 jamo luhrsen 2021-05-12 18:23:13 UTC
One permafailing tests case in these upgrade jobs is not ovn specific and looks like it will be addressed when this 4.7 backport is merged:
https://bugzilla.redhat.com/show_bug.cgi?id=1959238

Comment 5 jamo luhrsen 2021-05-12 23:37:49 UTC
Another permafailing test case (not specific to OVN) in upgrade jobs is "Application behind service load balancer with PDB is not disrupted"
That appears to be getting worked on with https://bugzilla.redhat.com/show_bug.cgi?id=1929396

Comment 6 jamo luhrsen 2021-05-13 00:06:13 UTC
the "cluster upgrade should be fast" test case also fails almost every time. there was a recent slack discussion around this:
  https://coreos.slack.com/archives/C01CQA76KMX/p1620236543482500

and two bugs (from that thread) to hopefully come in and help matters:
  https://bugzilla.redhat.com/show_bug.cgi?id=1942164
  https://bugzilla.redhat.com/show_bug.cgi?id=1817075

The test case has a 75m timeout before the failure will show up:
  https://github.com/openshift/origin/blob/d704a4d2ab5e55731d11770c11eacd666940b944/test/e2e/upgrade/upgrade.go#L274

The test case does pass every once in a while, and in the most recent failing job you can see the upgrade was 77 minutes, so barely
over the 75m window.
  https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade/1392413115697598464
  "upgrade to registry.build02.ci.openshift.org/ci-op-irx3q246/release@sha256:13d044e10254d79be573be32d7b3fcdb6d0da893cee5879c98fe4889a8d1e6da took too long: 77.8378904475"

Comment 8 jamo luhrsen 2021-06-04 18:20:47 UTC
closing this bug as it does not have any specific focus other than to track other bugs that may be causing
upgrade job failures. We need to have specific bugs for each different failure happening in those upgrade
jobs and mark them blocker (or not) as appropriate. Here is the current list as I know it, and I'm sure
it's not complete. I'm also sure there are other bugs not yet filed for the upgrade job, but until we make
progress on existing bugs it's very noisy to know what failure is already being tracked or not.

  https://bugzilla.redhat.com/show_bug.cgi?id=1943334
  https://bugzilla.redhat.com/show_bug.cgi?id=1927264
  https://bugzilla.redhat.com/show_bug.cgi?id=1959200
  https://bugzilla.redhat.com/show_bug.cgi?id=1942164
  https://bugzilla.redhat.com/show_bug.cgi?id=1817075
  https://bugzilla.redhat.com/show_bug.cgi?id=1968021
  https://bugzilla.redhat.com/show_bug.cgi?id=1968030
  https://bugzilla.redhat.com/show_bug.cgi?id=1968009
  https://bugzilla.redhat.com/show_bug.cgi?id=1944264
  https://bugzilla.redhat.com/show_bug.cgi?id=1943363


Note You need to log in before you can comment on or make changes to this bug.