Bug 1996555

Summary: OpenStack 4.8 -> 4.9 upgrade is failing periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade
Product: OpenShift Container Platform Reporter: Pierre Prinetti <pprinett>
Component: InstallerAssignee: Martin André <m.andre>
Installer sub component: OpenShift on OpenStack QA Contact: Jon Uriarte <juriarte>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: high CC: bparees, dgoodwin, juriarte, m.andre, sippy, stbenjam, stephenfin, vrutkovs
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1995387 Environment:
job=periodic-ci-openshift-verification-tests-master-stable-4.9-upgrade-from-stable-4.8-openstack-ipi=all
Last Closed: 2022-09-21 15:17:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pierre Prinetti 2021-08-23 08:21:39 UTC
Once Bug 1995387 has been fixed and the base image for the tests is available, the test started showing legit failures.

Job history (relevant jobs AFTER Aug 21): https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade

One example failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade/1429521411877113856

In particular, ETCD seems to not be healthy.

Comment 6 Pierre Prinetti 2021-08-31 12:23:56 UTC
Tests seem to be still failing...

Comment 7 Martin André 2021-09-01 15:24:46 UTC
We still need to investigate why this job is failing. Still high prio.

Comment 8 Vadim Rutkovsky 2021-09-06 15:33:02 UTC
Analyzing this job - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade/1434468962166378496

PromeCIeus shows that:
* at 13:05 etcd commit duration and wal sync times on .163 node spiked
* at approx. the same time different node - .32 - shows increased network round trip time

Seems infra is responsible for this

Comment 10 ShiftStack Bugwatcher 2021-11-25 16:12:10 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 11 Martin André 2022-05-11 13:25:19 UTC
*** Bug 2077270 has been marked as a duplicate of this bug. ***

Comment 12 Stephen Finucane 2022-09-21 15:17:51 UTC
We've integrated support for scheduled CI tasks with jitter into upstream Kubernetes CI infra and shouldn't be seeing this anymore. Closing as CURRENTRELEASE. We can open new bugs if this pops up again.