1959290 – openshift-kube-apiserver-operator should not rely on external networking for health check

Bug 1959290 - openshift-kube-apiserver-operator should not rely on external networking for health check

Summary: openshift-kube-apiserver-operator should not rely on external networking for ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-apiserver
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Lukasz Szaszkiewicz
QA Contact:	Yash Tripathi
Docs Contact:
URL:
Whiteboard:	LifecycleReset
Depends On:	1959285
Blocks:	1959291 1959292 1959293 1959294
TreeView+	depends on / blocked

Reported:	2021-05-11 08:16 UTC by Rom Freiman
Modified:	2023-09-15 01:06 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1959285
Clones:	1959291 (view as bug list)
Environment:
Last Closed:	2021-10-18 17:31:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-openshift-apiserver-operator pull 466	0	None	None	None	2021-08-20 11:18:06 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:31:56 UTC

Description Rom Freiman 2021-05-11 08:16:37 UTC

+++ This bug was initially created as a clone of Bug #1959285 +++

Apparently, openshift-apiserver-operator has dependency on SAR as part of it's healthcheck, which causes it to be restarted in case of kubeapi rollout in SNO.


How reproducible:

User cluster-bot:
1. launch nightly aws,single-node
2. Update audit log verbosity to: AllRequestBodies
3. Wait for api rollout (oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}')
4. reboot the node to cleanup the caches (oc debug node/ip-10-0-136-254.ec2.internal)
5. Wait
6. Grep the audit log: 

oc adm node-logs ip-10-0-128-254.ec2.internal --path=kube-apiserver/audit.log | grep -i health | grep -i subjectaccessreviews | grep -v Unhealth > rbac.log
cat rbac.log  | jq . -C | less -r | grep 'username' | sort | uniq



Actual results:
~/work/installer [master]> cat rbac.log  | jq . -C | less -r | grep 'username' | sort | uniq
    system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator",


Expected results:
It should not appear

Additional info:
Affects SNO stability upon api rollout (certificates rotation)

Comment 1 Stefan Schimanski 2021-05-11 11:01:07 UTC

Kube-apiserver is talking to itself through loopback. So I think this one is fine.

Comment 2 Rom Freiman 2021-06-08 17:58:02 UTC

Reopening following our discussion on slack

Comment 3 Lukasz Szaszkiewicz 2021-07-05 12:39:34 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 4 Michal Fojtik 2021-07-08 18:13:22 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 6 Michal Fojtik 2021-07-20 20:23:09 UTC

The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.

Comment 9 Yash Tripathi 2021-09-29 11:45:05 UTC

Verified in SNO 4.9.0-0.nightly-2021-09-27-105859:
1. Launched nightly, gcp cluster

$ oc get no
NAME                                              STATUS   ROLES           AGE     VERSION
ytripath-8dhtb-master-0.c.openshift-qe.internal   Ready    master,worker   3h13m   v1.22.0-rc.0+af080cb

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-27-105859   True        False         3h3m    Cluster version is 4.9.0-0.nightly-2021-09-27-105859

2. Updated audit log verbosity to: AllRequestBodies using (oc edit apiserver cluster)
3. Waited for api rollout (oc get kubeapiserver -o=jsonpath='{range.items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}')
4. reboot the node to cleanup the caches (oc debug no/ytripath-8dhtb-master-0.c.openshift-qe.internal)
5. Waited and Checked the audit log :
For KAS-O,
$ oc adm node-logs ip-10-0-128-254.ec2.internal --path=kube-apiserver/audit.log | grep -i health | grep -i subjectaccessreviews | grep -v Unhealth > rbac.log
$ cat rbac.log  | jq . -C | less -r | grep 'username' | sort | uniq
Results:
    "username": "system:apiserver",
    "username": "system:serviceaccount:openshift-authentication-operator:authentication-operator",
    "username": "system:serviceaccount:openshift-cluster-storage-operator:cluster-storage-operator",
    "username": "system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator",
As expected,  not found  "system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator"
For OAS-O,
$ oc adm node-logs ip-10-0-128-254.ec2.internal --path=openshift-apiserver/audit.log | grep -i health | grep -i subjectaccessreviews | grep -v Unhealth > rbac-oas.log
$ cat rbac-oas.log 
Nothing found.
As expected, not found anything about "system:serviceaccount:openshift-apiserver-operator"
Based on the above the bug is fixed, moving it to VERIFIED.

Comment 11 errata-xmlrpc 2021-10-18 17:31:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 12 Red Hat Bugzilla 2023-09-15 01:06:22 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.