Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1959292

Summary: kube-controller-manager-operator should not rely on external networking for health check
Product: OpenShift Container Platform Reporter: Rom Freiman <rfreiman>
Component: kube-controller-managerAssignee: ravig <rgudimet>
Status: CLOSED NOTABUG QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: aos-bugs, aos-storage-staff, mfojtik, piqin, sttts, xxia
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1959291
: 1959293 (view as bug list) Environment:
Last Closed: 2021-06-08 15:58:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1959285, 1959290, 1959291    
Bug Blocks: 1959293, 1959294    

Description Rom Freiman 2021-05-11 08:20:27 UTC
+++ This bug was initially created as a clone of Bug #1959291 +++

+++ This bug was initially created as a clone of Bug #1959290 +++

+++ This bug was initially created as a clone of Bug #1959285 +++

Apparently, kube-controller-manager-operator has dependency on SAR as part of it's healthcheck, which causes it to be restarted in case of kubeapi rollout in SNO.


How reproducible:

User cluster-bot:
1. launch nightly aws,single-node
2. Update audit log verbosity to: AllRequestBodies
3. Wait for api rollout (oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}')
4. reboot the node to cleanup the caches (oc debug node/ip-10-0-136-254.ec2.internal)
5. Wait
6. Grep the audit log: 

oc adm node-logs ip-10-0-128-254.ec2.internal --path=kube-apiserver/audit.log | grep -i health | grep -i subjectaccessreviews | grep -v Unhealth > rbac.log
cat rbac.log  | jq . -C | less -r | grep 'username' | sort | uniq



Actual results:
~/work/installer [master]> cat rbac.log  | jq . -C | less -r | grep 'username' | sort | uniq
    "username": "system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator"

Expected results:
It should not appear

Additional info:
Affects SNO stability upon api rollout (certificates rotation)

Comment 1 ravig 2021-06-08 15:58:19 UTC
Hi Rom,

The health checks we have for KCMO check for the KCM endpoint. We just have health checks for 10257 port as you can see here:

https://github.com/openshift/cluster-kube-controller-manager-operator/blob/dc54142035982bc44581936a7e90cdd9ac9ad24e/bindata/v4.1.0/kube-controller-manager/pod.yaml

When KCMO starts it connects to APIServer and perhaps that was the reason you are noticing those entries in event log. 

So, closing this BZ for now. Feel free to open it in case you feel otherwise.