Bug 1691085
| Summary: | clusteroperator/kube-scheduler not ready due to static pods failing with missing RBAC | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | ewolinet | ||||
| Component: | Master | Assignee: | Michal Fojtik <mfojtik> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Xingxing Xia <xxia> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 4.1.0 | CC: | aos-bugs, bparees, ccoleman, gblomqui, jokerman, mfojtik, mmccomas | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.3.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-03-10 19:32:11 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
ewolinet
2019-03-20 20:07:26 UTC
kube-scheduler (the operator reporting the error event) is the Pod team, so sending there first, but i expect they may send this to the master team since it looks like there were kube-apiserver issues. Created attachment 1546773 [details] Occurrences of this error in CI from 2019-03-19T12:28 to 2019-03-21T20:06 UTC This has caused 18 of our 861 failures in *-e2e-aws* jobs across the whole CI system over the past 55 hours. Generated with [1]: $ deck-build-log-plot 'clusteroperator/kube-scheduler changed Failing to True.*clusterrole.rbac.authorization.*not found' [1]: https://github.com/wking/openshift-release/tree/debug-scripts/deck-build-log yes, sending to Master to figure out why these ClusterRoles do not exist. *** Bug 1694186 has been marked as a duplicate of this bug. *** I'm not sure this belongs with auth. I don't see anything to indicate an auth issue. The scheduler pod eventually starts up; the error log gets swept up as the last error that occurred but the failure seems to be the test timed out. The clusterrole not found issue is more likely an api server uptime issue than a problem of those CRs not existing. AWS resource limits, cert rotation timings, possibly other things could be contributing to the delays in the test run. I'm not sure who's the right owner for those types of issues. This has caused 18 of our 861 failures, lowering priority as this might only be transient failure. doesn't seem to be occurring anymore. https://search.svc.ci.openshift.org/?search=changed+Failing+to+True%3A+StaticPodsFailing%3A+StaticPodsFailing&maxAge=336h&context=2&type=all |