Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1820385

Summary:	[build-cop]kube-apiserver tls: bad certificate
Product:	OpenShift Container Platform	Reporter:	Qi Wang <qiwan>
Component:	apiserver-auth	Assignee:	Stefan Schimanski <sttts>
Status:	CLOSED DUPLICATE	QA Contact:	scheng
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	unspecified	CC:	aos-bugs, mfojtik, slaznick, vareti
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-06 08:13:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Qi Wang 2020-04-02 22:37:20 UTC

Description of problem:



Several failure from CI test https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/477/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn/116/artifacts/e2e-aws-ovn/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-56cf557f86-cp8x6_kube-apiserver-operator.log

I0402 17:15:40.272852       1 log.go:172] http: TLS handshake error from 10.131.0.9:47118: remote error: tls: bad certificate
I0402 17:15:56.231952       1 log.go:172] http: TLS handshake error from 10.129.2.10:60358: remote error: tls: bad certificate
I0402 17:16:10.275812       1 log.go:172] http: TLS handshake error from 10.131.0.9:47558: remote error: tls: bad certificate
I0402 17:16:26.232076       1 log.go:172] http: TLS handshake error from 10.129.2.10:60564: remote error: tls: bad certificate
I0402 17:16:40.273129       1 log.go:172] http: TLS handshake error from 10.131.0.9:48020: remote error: tls: bad certificate
I0402 17:16:56.232633       1 log.go:172] http: TLS handshake error from 10.129.2.10:60774: remote error: tls: bad certificate

Comment 1 Venkata Siva Teja Areti 2020-04-03 18:28:33 UTC

I don't think this is a kube-apiserver issue

An sig-instrumentation e2e test failed with this error

"count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1": {
            s: "promQL query: count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1 had reported incorrect results:\n[{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-apiserver-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-controller-manager-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-kube-apiserver-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-service-catalog-controller-manager-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]}]",
        },
    }
to be empty



looked for similar error messages in other pod logs after downloading the artifacts


> grep "remote error: tls: bad certificate" pods/* -ril | sort -u
pods/openshift-apiserver-operator_openshift-apiserver-operator-7cb747b96f-smdzb_openshift-apiserver-operator.log
pods/openshift-controller-manager-operator_openshift-controller-manager-operator-5964bc7db6-dgt9q_operator.log
pods/openshift-kube-apiserver-operator_kube-apiserver-operator-56cf557f86-cp8x6_kube-apiserver-operator.log
pods/openshift-service-catalog-controller-manager-operator_openshift-service-catalog-controller-manager-operator-5554hlkkm_operator.log



10.131.0.9 and 10.129.2.10 are prometheus endpoints that scrape metrics

                            "ip": "10.131.0.9",
                            "nodeName": "ip-10-0-149-125.us-west-2.compute.internal",
                            "targetRef": {
                                "kind": "Pod",
                                "name": "prometheus-k8s-1",
                                "namespace": "openshift-monitoring",
                                "resourceVersion": "18164",
                                "uid": "f5e55efa-8c6c-4b33-b29f-9c18547fd7b3"
                            }

                            "ip": "10.129.2.10",
                            "nodeName": "ip-10-0-140-150.us-west-2.compute.internal",
                            "targetRef": {
                                "kind": "Pod",
                                "name": "prometheus-k8s-0",
                                "namespace": "openshift-monitoring",
                                "resourceVersion": "18346",
                                "uid": "d00ff064-f0bd-4545-a163-e9327daab087"
                            }


serving-certs-ca-bundle configmap used by prometheus-k8s is updated only once and it is much before than the timestamp seen in first occurrence of above error


> grep serving-certs-ca-bundle ./* -ri
./pods/openshift-service-ca_service-ca-57cf89d54d-z2w4r_service-ca-controller.log:I0402 16:36:07.890633       1 configmap.go:53] updating configmap openshift-monitoring/serving-certs-ca-bundle with the service signing CA bundle
./pods/openshift-service-ca_service-ca-57cf89d54d-z2w4r_service-ca-controller.log:I0402 16:36:08.170876       1 configmap.go:53] updating configmap openshift-monitoring/telemeter-client-serving-certs-ca-bundle with the service signing CA bundle

Comment 2 Venkata Siva Teja Areti 2020-04-03 18:45:51 UTC

Feel free to re-assign this if you think this is related to either monitoring or networking. It could as well be a one off error

Comment 3 Standa Laznicka 2020-04-06 08:13:45 UTC

This looks like an issue observed some time ago. There is a race in library-go, closing as a duplicate.

*** This bug has been marked as a duplicate of bug 1779438 ***