Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1867120

Summary: cluster is inaccessible with kas containers showing many "tls: bad certificate" and "x509: certificate signed by unknown authority" logs
Product: OpenShift Container Platform Reporter: Xingxing Xia <xxia>
Component: kube-apiserverAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED DUPLICATE QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: aos-bugs, jima, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1867868 (view as bug list) Environment:
Last Closed: 2020-08-25 03:12:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1867868    

Description Xingxing Xia 2020-08-07 11:52:00 UTC
Description of problem:
cluster is inaccessible with kas containers showing many "tls: bad certificate" and "x509: certificate signed by unknown authority" logs

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-08-07-024812
rhcos_ami: "rhcos-45.82.202008010929-0"

How reproducible:
always

Steps to Reproduce:
1. Install upi vsphere (6.7) env with above version

Actual results:
1. Installation failed. The installation logs ended with:
...
The connection to the server api.jima45upitest01.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
Expect 3 ready master nodes, but found 0, waiting...
Try 1, before approve csr, run 'oc get node --no-headers -l 'node-role.kubernetes.io/master'; oc get csr'
The connection to the server api.jima45upitest01.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
The connection to the server api.jima45upitest01.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
Trying to approve CSR requests for nodes......
The connection to the server api.jima45upitest01.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
...

Expected results:
1. Should succeed.

Additional info:
ssh to master, check kas containers:
[root@control-plane-2 xxia-debug]# crictl ps -a | grep kube-apiserver
f22de68ab6af5       ccd7b0e07b19bd94230eb49b8da93821461cc3aecc1e38afe00136535adf6de2
     4 hours ago         Running             kube-apiserver-insecure-readyz                0                   5baaa11702884
73cc851b3ee8e       ccd7b0e07b19bd94230eb49b8da93821461cc3aecc1e38afe00136535adf6de2
     4 hours ago         Running             kube-apiserver-cert-regeneration-controller   0                   5baaa11702884
857a95b044166       ccd7b0e07b19bd94230eb49b8da93821461cc3aecc1e38afe00136535adf6de2
     4 hours ago         Running             kube-apiserver-cert-syncer                    0                   5baaa11702884
81b4b51d7d341       7a8f32d752782629e9d97f69a619b1570377011a0fb544a5e7fceced81edf281
     4 hours ago         Running             kube-apiserver

Check kube-apiserver logs (see attachments):
crictl logs 81b4b51d7d341 &> kas.log
crictl logs 857a95b044166 &> kube-apiserver-cert-syncer.log

kas.log shows many "I0807 07:18:08.429523       1 log.go:172] http: TLS handshake error from [::1]:56428: remote error: tls: bad certificate" and 'v1.xxxxxx.openshift.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable'

kube-apiserver-cert-syncer.log shows many 'Failed to list *v1.xxxx: Get https://localhost:6443/api/v1/namespaces/openshift-kube-apiserver/xxxx?limit=500&resourceVersion=0: x509: certificate signed by unknown authority'

Comment 2 jima 2020-08-10 01:19:47 UTC
This issue is happened when using new rhcos template(rhcos-45.82.202008010929-0) to boot up vms 
It works well with previous rhcos tmplate(rhcos-45.82.202007141718-0).

Comment 3 jima 2020-08-25 03:12:21 UTC
Issue is identified due to Bug 1870038, after applying workaround in Bug 1870038, installation is successful, this issue is not produced any more.
So close this bug.

Comment 4 Xingxing Xia 2020-08-25 03:26:35 UTC

*** This bug has been marked as a duplicate of bug 1870038 ***