Bug 2049907

Summary: SNO: cluster-policy-controller failed to start due to missing serving-cert/tls.crt
Product: OpenShift Container Platform Reporter: Filip Krepinsky <fkrepins>
Component: kube-controller-managerAssignee: Filip Krepinsky <fkrepins>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: high    
Version: 4.9CC: aos-bugs, calfonso, fkrepins, knarra, maszulik, mfojtik, openshift-bugzilla-robot, rfreiman, rgudimet, surbania, tkatarki, yinzhou
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2048484 Environment:
Last Closed: 2022-02-14 12:00:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2048484    
Bug Blocks:    

Comment 4 RamaKasturi 2022-02-10 12:29:16 UTC
Checked from the recent jobs and could not see the error as reported.

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-single-node-live-iso/1488560014497943552/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-1f3994a75464c01f1953aaeda23c2a02c477e1b5ea36eb3434123ecccd141b0c/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log -> Feb1st

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-single-node-live-iso/1489736870999887872/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-1f3994a75464c01f1953aaeda23c2a02c477e1b5ea36eb3434123ecccd141b0c/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log -> Feb2nd

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-single-node-live-iso/1490526977445072896/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-1f3994a75464c01f1953aaeda23c2a02c477e1b5ea36eb3434123ecccd141b0c/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log -> Feb7th

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-single-node-live-iso/1491426754189856768/artifacts/e2e-metal-single-node-live-iso/baremetalds-sno-gather/artifacts/post-tests-must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-023fba035f3807a225d262ef0f32e726cbdf6aadc3959fe82283778fecb33954/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-test-infra-cluster-master-0/cluster-policy-controller/cluster-policy-controller/logs/current.log -> Feb9th

Built the SNO cluster and it was successful, so moving the bug to verified state.

https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/75425/

Also checked with Filip as i did not see that 4.9 did not encounter the issue before or after the BZ was reported, and what i learnt is "I am not sure how often this issue manifests itself in 4.9. They reported that it started hurting them only in 4.10 so the issue can be pretty rare in 4.9. But looking at the code, the issue still could potentially appear in 4.9.
quote from https://bugzilla.redhat.com/show_bug.cgi?id=2045872, After analyzing last week CI, this issue causes 23% installations to fail (4/17) - which basically leads to 77% success rate (assuming this is the main reason for SNOs to fail). Just to compare, 4.8 we had 99.7% success rate, 4.9 97%. "

Based on the above moving bug to verified state.

Comment 6 errata-xmlrpc 2022-02-14 12:00:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0488