Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1996555

Summary:	OpenStack 4.8 -> 4.9 upgrade is failing periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade
Product:	OpenShift Container Platform	Reporter:	Pierre Prinetti <pprinett>
Component:	Installer	Assignee:	Martin André <m.andre>
Installer sub component:	OpenShift on OpenStack	QA Contact:	Jon Uriarte <juriarte>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	medium
Priority:	high	CC:	bparees, dgoodwin, juriarte, m.andre, sippy, stbenjam, stephenfin, vrutkovs
Version:	4.9	Keywords:	Triaged
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1995387	Environment:	job=periodic-ci-openshift-verification-tests-master-stable-4.9-upgrade-from-stable-4.8-openstack-ipi=all
Last Closed:	2022-09-21 15:17:51 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pierre Prinetti 2021-08-23 08:21:39 UTC

Once Bug 1995387 has been fixed and the base image for the tests is available, the test started showing legit failures.

Job history (relevant jobs AFTER Aug 21): https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade

One example failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade/1429521411877113856

In particular, ETCD seems to not be healthy.

Comment 6 Pierre Prinetti 2021-08-31 12:23:56 UTC

Tests seem to be still failing...

Comment 7 Martin André 2021-09-01 15:24:46 UTC

We still need to investigate why this job is failing. Still high prio.

Comment 8 Vadim Rutkovsky 2021-09-06 15:33:02 UTC

Analyzing this job - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-openstack-upgrade/1434468962166378496

PromeCIeus shows that:
* at 13:05 etcd commit duration and wal sync times on .163 node spiked
* at approx. the same time different node - .32 - shows increased network round trip time

Seems infra is responsible for this

Comment 9 Martin André 2021-09-30 12:00:02 UTC

Wonder if https://github.com/openshift/machine-config-operator/pull/2782 wouldn't help there, cf https://bugzilla.redhat.com/show_bug.cgi?id=2002121.

Comment 10 ShiftStack Bugwatcher 2021-11-25 16:12:10 UTC

Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 11 Martin André 2022-05-11 13:25:19 UTC

*** Bug 2077270 has been marked as a duplicate of this bug. ***

Comment 12 Stephen Finucane 2022-09-21 15:17:51 UTC

We've integrated support for scheduled CI tasks with jitter into upstream Kubernetes CI infra and shouldn't be seeing this anymore. Closing as CURRENTRELEASE. We can open new bugs if this pops up again.