2045872 – SNO: cluster-policy-controller failed to start due to missing serving-cert/tls.crt

Bug 2045872 - SNO: cluster-policy-controller failed to start due to missing serving-cert/tls.crt

Summary: SNO: cluster-policy-controller failed to start due to missing serving-cert/tl...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Filip Krepinsky
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1961204 (view as bug list)
Depends On:
Blocks:	2048484
TreeView+	depends on / blocked

Reported:	2022-01-25 20:19 UTC by Igal Tsoiref
Modified:	2023-09-15 01:51 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:43:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-controller-manager-operator pull 594	0	None	open	Bug 2045872: allow cluster-policy-controller to fallback to default cert	2022-01-28 14:20:13 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 10:44:05 UTC

Description Igal Tsoiref 2022-01-25 20:19:31 UTC

Description of problem:
Lately we have SNO ci jobs failing. 

We found that cluster-policy-controller failed to start due to :


2022-01-22T15:55:00.211340589Z F0122 15:55:00.211297       1 cmd.go:138] open /etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt: no such file or directory
2022-01-22T15:55:00.211691699Z goroutine 1 [running]:
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.stacks(0x1)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:1038 +0x8a
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).output(0x3a75fe0, 0x3, 0x0, 0xc0006f6000, 0x1, {0x2cfcd69, 0x10}, 0xc000100000, 0x0)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:987 +0x5fd
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).printDepth(0xc000253bc0, 0x26b1c70, 0x0, {0x0, 0x0}, 0x37, {0xc00089da70, 0x1, 0x1})
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:735 +0x1ae
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.(*loggingT).print(...)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:717
2022-01-22T15:55:00.211691699Z k8s.io/klog/v2.Fatal(...)
2022-01-22T15:55:00.211691699Z 	k8s.io/klog/v2.0/klog.go:1512

link to specific logfile:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484898355296342016/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/registry-redhat-io-openshift4-ose-must-gather-sha256-8c0b3bc10756c463f1aa6b622e396ae244079dd8f7f2f3c5d8695a777c95eec6/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
cluster-policy-controller failing to start

Expected results:
cluster-policy-controller starts

Additional info:
You can find must-gather logs here:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484898355296342016/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather

Another example:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-single-node-live-iso/1484777420153163776/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/

we have couple more if needed

Comment 1 Sergiusz Urbaniak 2022-01-26 14:39:42 UTC

1. in “regular” installs, the bootstrap cluster-policy-controller ensures that UID ranges are applied in namespaces. this instance does not require service-ca certs.
2. the final cluster-policy-controller does require a service-ca cert. Hence if service-ca-operator cannot start, because cluster-policy-controller from 1. did not provision UID ranges in its namespace, SCC admission will fail preventing service-ca-operator to start, resulting in the failure state here.

The fix to the issue is to make cluster-policy-controller not to be dependent on the service-ca cert secret. It must be able to start without it even in scenario 2. For rebootstrapping scenarios, the cluster-policy-controller must be able to start without a service-ca generated serving certificate

Comment 10 Filip Krepinsky 2022-01-31 10:51:52 UTC

backport created https://bugzilla.redhat.com/show_bug.cgi?id=2048484

Comment 13 Filip Krepinsky 2022-04-04 21:50:41 UTC

*** Bug 1961204 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2022-08-10 10:43:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 18 Red Hat Bugzilla 2023-09-15 01:51:19 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.