Bug 2100233
Summary: | Baremetal UPI - apiserver high number of restarts due to poststarthook/authorization.openshift.io-bootstrapclusterroles check failed: healthz | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gabriel Scheffer <gscheffe> |
Component: | openshift-apiserver | Assignee: | Luis Sanchez <sanchezl> |
Status: | CLOSED DUPLICATE | QA Contact: | Rahul Gangwar <rgangwar> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.10 | CC: | akashem, aygarg, bsmitley, jkaur, mfojtik, pawankum, pkhaire, sanchezl, sar, simore, slaznick, smaudet, sponnaga, vkochuku, wlewis |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-25 14:56:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gabriel Scheffer
2022-06-22 19:26:11 UTC
Hello, I'm updating the Bug on behalf of Pawan. Please find the below comment from the Customer. When we are reverting back to default values API-pods are failing, while with suggested values pods are up and running. Also, the Customer uses proxy and it is the standard one used across all the openshift clusters. ~~~~ Still it is running with override values suggested in the ticket, if we put the default values (failureThreshold 3) api-server pods are in crashloopbackup state. "Set UnManaged from CV and Scaled Operator to 0. Increased probe failure to 10 (from 3), and it seems to be holding. If we set everything back to managed it fails again. ```yaml apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: name: version spec: overrides: - group: apps kind: Deployment name: openshift-apiserver-operator namespace: openshift-apiserver-operator unmanaged: true ```" ~~~ I can see from the original must-gather that the kube-apiserver is also failing in its role-bootstrapping logic. How many role, rolebinding, clusterrole and clusterrolebinding objects are there in the cluster? Are there any admission webhooks present in the cluster that operate on RBAC resources? Michal Fojtik also discovered that there were a few HTTP 500s responses to the OAS with regards to some cluster/rolebindings retrieval. Would it be possible to get a must-gather that contains: - audit logs - logs of failing openshift-apiserver pods - kube-apiserver logs from the time period when the openshift-apiserver pods above were failing - possibly even kube-apiserver logs that contain the kube-apiserver startup (note that the logs retrieved by must-gather can be truncated) (In reply to Standa Laznicka from comment #27) > I can see from the original must-gather that the kube-apiserver is also > failing in its role-bootstrapping logic. > > How many role, rolebinding, clusterrole and clusterrolebinding objects are > there in the cluster? Are there any admission webhooks present in the > cluster that operate on RBAC resources? > > Michal Fojtik also discovered that there were a few HTTP 500s responses to > the OAS with regards to some cluster/rolebindings retrieval. > > Would it be possible to get a must-gather that contains: > - audit logs > - logs of failing openshift-apiserver pods > - kube-apiserver logs from the time period when the openshift-apiserver pods > above were failing > - possibly even kube-apiserver logs that contain the kube-apiserver startup > (note that the logs retrieved by must-gather can be truncated) Hello Standa, I think customer has provided audit logs 2-3 times, in comment22 as well, wasn't those helpful? Will it be possible for someone from engineering team to go on call and collect all log for once? May be this will help in quicker troubleshooting. I will try to ask for required info in the meantime. Regards, Pawan Fixed in 4.10.25. If we're saying the fix was delivered in Bug 2109235 we should've marked this as a dupe so that no one has to read through every comment to arrive at that conclusion. *** This bug has been marked as a duplicate of bug 2109235 *** |