1845411 – Top-level cross-platform: upgrade kube API disruption in CI

Bug 1845411 - Top-level cross-platform: upgrade kube API disruption in CI

Summary: Top-level cross-platform: upgrade kube API disruption in CI

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Stefan Schimanski
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:	LifecycleReset,LifecycleFrozen
Duplicates (4):	1828861 1852601 1867237 1883207 (view as bug list)
Depends On:	1845410 1845412 1845414 1845416 1868735 1868741 1873961 1879219 1879454 1888673 1892721 1943804
Blocks:	1869788 1869790 1921157
TreeView+	depends on / blocked

Reported:	2020-06-09 07:42 UTC by Stefan Schimanski
Modified:	2023-09-15 00:32 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1845410
Clones:	1869788 (view as bug list)
Environment:	Kubernetes and OpenShift APIs remain available Kubernetes APIs remain available OpenShift APIs remain available OAuth APIs remain available [sig-api-machinery] Kubernetes APIs remain available [sig-api-machinery] OpenShift APIs remain available [sig-api-machinery] OAuth APIs remain available
Last Closed:	2022-02-25 18:58:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Stefan Schimanski 2020-06-09 07:42:22 UTC

We see the Kube API to be unavailable during upgrades on different platforms.

This is not supposed to happen if graceful termination and LB endpoint reconcialation by the cloud provider work correctly.

Note: openshift-apiserver APIs are unavailable to if the kube-apiserver is not serving correctly.

This is a top-level umbrella bug. Never clone this into releases!

Comment 1 Stefan Schimanski 2020-06-18 10:20:48 UTC

*** Bug 1828861 has been marked as a duplicate of this bug. ***

Comment 2 Stefan Schimanski 2020-06-18 11:39:30 UTC

Work in progress.

Comment 3 W. Trevor King 2020-06-24 05:34:18 UTC

Saving folks some click-throughs, current blockers are:

* Bug 1845410: GCP (4.6)
* Bug 1845412: AWS (4.6)
* Bug 1845414: Azure (4.6)
* Bug 1845416: GCP (4.5)

Comment 4 W. Trevor King 2020-06-30 21:06:38 UTC

*** Bug 1852601 has been marked as a duplicate of this bug. ***

Comment 6 Stefan Schimanski 2020-08-03 11:23:48 UTC

WIP.

Comment 7 W. Trevor King 2020-08-12 21:07:58 UTC

Mentioning affected test-cases for Sippy:

Kubernetes APIs remain available
OpenShift APIs remain available

Possibly also:

OAuth APIs remain available

Comment 8 W. Trevor King 2020-08-12 21:09:13 UTC

And with the sigs, in case that matters to Sippy:

[sig-api-machinery] Kubernetes APIs remain available
[sig-api-machinery] OpenShift APIs remain available
[sig-api-machinery] OAuth APIs remain available

Comment 9 W. Trevor King 2020-08-12 21:09:55 UTC

*** Bug 1867237 has been marked as a duplicate of this bug. ***

Comment 11 Stefan Schimanski 2020-09-11 14:46:54 UTC

This is an umbrella bug for API disruption. Labelling with UpcomingSprint.

Comment 12 Michal Fojtik 2020-09-25 19:02:56 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 13 Stefan Schimanski 2020-09-28 15:37:26 UTC

*** Bug 1883207 has been marked as a duplicate of this bug. ***

Comment 14 Michal Fojtik 2020-09-28 16:03:03 UTC

The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 15 Stefan Schimanski 2020-10-02 09:23:12 UTC

This is an umbrella bug for API disruption. Labelling with UpcomingSprint.

Comment 16 Benjamin Gilbert 2020-10-08 18:47:49 UTC

Lots of "operator install *" tests, mostly in 4.7, are also failing with "operator is not reporting conditions".

Comment 19 Stefan Schimanski 2020-11-16 14:39:48 UTC

This is an umbrella bug for API disruption. Labelling with UpcomingSprint.

Comment 20 Michal Fojtik 2020-11-22 19:12:06 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 22 Michal Fojtik 2020-12-02 14:46:29 UTC

The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 23 W. Trevor King 2020-12-03 00:42:43 UTC

My impression was that this umbrella bug would remain open until all child bugs were resolved.  Bug 1845410 and others are still open at the moment.

Comment 25 Wally 2021-01-22 16:04:42 UTC

Setting blocker- as priority = low.

Comment 26 Miciah Dashiel Butler Masters 2021-02-19 04:05:51 UTC

As build watcher, I noticed that "[sig-api-machinery] OpenShift APIs remain available" (which is in this BZ's environment field) is failing in the "release-openshift-origin-installer-old-rhcos-e2e-aws-4.7" job 50% of the time.

Comment 27 Miciah Dashiel Butler Masters 2021-02-19 04:10:45 UTC

The failures in release-openshift-origin-installer-old-rhcos-e2e-aws-4.7 all fail with errors like the following:

  API "openshift-api-available" was unreachable during disruption for at least 1m20s of 47m35s (3%), this is currently sufficient to pass the test/job but not considered completely correct:

Failing and logging "this is currently sufficient" seems contradictory; perhaps the pass/fail logic just needs to be fixed, at least for these particular failures, to pass if the threshold is not exceeded.

Comment 28 W. Trevor King 2021-02-19 04:16:42 UTC

Miciah, example old-RHCOS 4.7 job [1] has:

  API "openshift-api-available" was unreachable during disruption for at least 1m17s of 44m44s (3%), this is currently sufficient to pass the test/job but not considered completely correct:

But it's not failing the test.  That test-case gets that^ one failure round, and then a subsequent no-op "pass" round, so JUnit consumers count it under the "flaky" set, not the "failing" set.  Not an awesome UX, but the best we can do today.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.7/1362497177733042176

Comment 30 Devan Goodwin 2021-12-13 14:35:52 UTC

Looks like some very stale dependent bugs attached to this. Removing trt tracking for this issue.

Comment 32 Red Hat Bugzilla 2023-09-15 00:32:35 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.