Hide Forgot
Description of problem: Lately we have SNO ci jobs failing. We found that cluster-policy-controller failed to start due to : 2022-01-22T15:55:00.211340589Z F0122 15:55:00.211297 1 cmd.go:138] open /etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt: no such file or directory 2022-01-22T15:55:00.211691699Z goroutine 1 [running]: 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.stacks(0x1) 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.0/klog.go:1038 +0x8a 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).output(0x3a75fe0, 0x3, 0x0, 0xc0006f6000, 0x1, {0x2cfcd69, 0x10}, 0xc000100000, 0x0) 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.0/klog.go:987 +0x5fd 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).printDepth(0xc000253bc0, 0x26b1c70, 0x0, {0x0, 0x0}, 0x37, {0xc00089da70, 0x1, 0x1}) 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.0/klog.go:735 +0x1ae 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).print(...) 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.0/klog.go:717 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.Fatal(...) 2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.0/klog.go:1512 link to specific logfile: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484898355296342016/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/registry-redhat-io-openshift4-ose-must-gather-sha256-8c0b3bc10756c463f1aa6b622e396ae244079dd8f7f2f3c5d8695a777c95eec6/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: cluster-policy-controller failing to start Expected results: cluster-policy-controller starts Additional info: You can find must-gather logs here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484898355296342016/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather Another example: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484777420153163776/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/ we have couple more if needed
1. in “regular” installs, the bootstrap cluster-policy-controller ensures that UID ranges are applied in namespaces. this instance does not require service-ca certs. 2. the final cluster-policy-controller does require a service-ca cert. Hence if service-ca-operator cannot start, because cluster-policy-controller from 1. did not provision UID ranges in its namespace, SCC admission will fail preventing service-ca-operator to start, resulting in the failure state here. The fix to the issue is to make cluster-policy-controller not to be dependent on the service-ca cert secret. It must be able to start without it even in scenario 2. For rebootstrapping scenarios, the cluster-policy-controller must be able to start without a service-ca generated serving certificate
backport created https://bugzilla.redhat.com/show_bug.cgi?id=2048484
*** Bug 1961204 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069