Bug 2045872 - SNO: cluster-policy-controller failed to start due to missing serving-cert/tls.crt
Summary: SNO: cluster-policy-controller failed to start due to missing serving-cert/tl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Filip Krepinsky
QA Contact: zhou ying
URL:
Whiteboard:
: 1961204 (view as bug list)
Depends On:
Blocks: 2048484
TreeView+ depends on / blocked
 
Reported: 2022-01-25 20:19 UTC by Igal Tsoiref
Modified: 2023-09-15 01:51 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:43:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 594 0 None open Bug 2045872: allow cluster-policy-controller to fallback to default cert 2022-01-28 14:20:13 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:44:05 UTC

Description Igal Tsoiref 2022-01-25 20:19:31 UTC
Description of problem:
Lately we have SNO ci jobs failing. 

We found that cluster-policy-controller failed to start due to :


2022-01-22T15:55:00.211340589Z F0122 15:55:00.211297       1 cmd.go:138] open /etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt: no such file or directory
2022-01-22T15:55:00.211691699Z goroutine 1 [running]:
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.stacks(0x1)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:1038 +0x8a
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).output(0x3a75fe0, 0x3, 0x0, 0xc0006f6000, 0x1, {0x2cfcd69, 0x10}, 0xc000100000, 0x0)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:987 +0x5fd
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).printDepth(0xc000253bc0, 0x26b1c70, 0x0, {0x0, 0x0}, 0x37, {0xc00089da70, 0x1, 0x1})
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:735 +0x1ae
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).print(...)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:717
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.Fatal(...)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:1512

link to specific logfile:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484898355296342016/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/registry-redhat-io-openshift4-ose-must-gather-sha256-8c0b3bc10756c463f1aa6b622e396ae244079dd8f7f2f3c5d8695a777c95eec6/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
cluster-policy-controller failing to start

Expected results:
cluster-policy-controller starts

Additional info:
You can find must-gather logs here:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484898355296342016/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather

Another example:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484777420153163776/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/

we have couple more if needed

Comment 1 Sergiusz Urbaniak 2022-01-26 14:39:42 UTC
1. in “regular” installs, the bootstrap cluster-policy-controller ensures that UID ranges are applied in namespaces. this instance does not require service-ca certs.
2. the final cluster-policy-controller does require a service-ca cert. Hence if service-ca-operator cannot start, because cluster-policy-controller from 1. did not provision UID ranges in its namespace, SCC admission will fail preventing service-ca-operator to start, resulting in the failure state here.

The fix to the issue is to make cluster-policy-controller not to be dependent on the service-ca cert secret. It must be able to start without it even in scenario 2. For rebootstrapping scenarios, the cluster-policy-controller must be able to start without a service-ca generated serving certificate

Comment 10 Filip Krepinsky 2022-01-31 10:51:52 UTC
backport created https://bugzilla.redhat.com/show_bug.cgi?id=2048484

Comment 13 Filip Krepinsky 2022-04-04 21:50:41 UTC
*** Bug 1961204 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2022-08-10 10:43:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 18 Red Hat Bugzilla 2023-09-15 01:51:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.