1820385 – [build-cop]kube-apiserver tls: bad certificate

Bug 1820385 - [build-cop]kube-apiserver tls: bad certificate

Summary: [build-cop]kube-apiserver tls: bad certificate

Keywords:
Status:	CLOSED DUPLICATE of bug 1779438
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Stefan Schimanski
QA Contact:	scheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-02 22:37 UTC by Qi Wang
Modified:	2020-04-06 08:13 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-06 08:13:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Qi Wang 2020-04-02 22:37:20 UTC

Description of problem:



Several failure from CI test https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/477/pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn/116/artifacts/e2e-aws-ovn/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-56cf557f86-cp8x6_kube-apiserver-operator.log

I0402 17:15:40.272852       1 log.go:172] http: TLS handshake error from 10.131.0.9:47118: remote error: tls: bad certificate
I0402 17:15:56.231952       1 log.go:172] http: TLS handshake error from 10.129.2.10:60358: remote error: tls: bad certificate
I0402 17:16:10.275812       1 log.go:172] http: TLS handshake error from 10.131.0.9:47558: remote error: tls: bad certificate
I0402 17:16:26.232076       1 log.go:172] http: TLS handshake error from 10.129.2.10:60564: remote error: tls: bad certificate
I0402 17:16:40.273129       1 log.go:172] http: TLS handshake error from 10.131.0.9:48020: remote error: tls: bad certificate
I0402 17:16:56.232633       1 log.go:172] http: TLS handshake error from 10.129.2.10:60774: remote error: tls: bad certificate

Comment 1 Venkata Siva Teja Areti 2020-04-03 18:28:33 UTC

I don't think this is a kube-apiserver issue

An sig-instrumentation e2e test failed with this error

"count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1": {
            s: "promQL query: count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1 had reported incorrect results:\n[{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-apiserver-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-controller-manager-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-kube-apiserver-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-service-catalog-controller-manager-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585847668.037,\"41\"]}]",
        },
    }
to be empty



looked for similar error messages in other pod logs after downloading the artifacts


> grep "remote error: tls: bad certificate" pods/* -ril | sort -u
pods/openshift-apiserver-operator_openshift-apiserver-operator-7cb747b96f-smdzb_openshift-apiserver-operator.log
pods/openshift-controller-manager-operator_openshift-controller-manager-operator-5964bc7db6-dgt9q_operator.log
pods/openshift-kube-apiserver-operator_kube-apiserver-operator-56cf557f86-cp8x6_kube-apiserver-operator.log
pods/openshift-service-catalog-controller-manager-operator_openshift-service-catalog-controller-manager-operator-5554hlkkm_operator.log



10.131.0.9 and 10.129.2.10 are prometheus endpoints that scrape metrics

                            "ip": "10.131.0.9",
                            "nodeName": "ip-10-0-149-125.us-west-2.compute.internal",
                            "targetRef": {
                                "kind": "Pod",
                                "name": "prometheus-k8s-1",
                                "namespace": "openshift-monitoring",
                                "resourceVersion": "18164",
                                "uid": "f5e55efa-8c6c-4b33-b29f-9c18547fd7b3"
                            }

                            "ip": "10.129.2.10",
                            "nodeName": "ip-10-0-140-150.us-west-2.compute.internal",
                            "targetRef": {
                                "kind": "Pod",
                                "name": "prometheus-k8s-0",
                                "namespace": "openshift-monitoring",
                                "resourceVersion": "18346",
                                "uid": "d00ff064-f0bd-4545-a163-e9327daab087"
                            }


serving-certs-ca-bundle configmap used by prometheus-k8s is updated only once and it is much before than the timestamp seen in first occurrence of above error


> grep serving-certs-ca-bundle ./* -ri
./pods/openshift-service-ca_service-ca-57cf89d54d-z2w4r_service-ca-controller.log:I0402 16:36:07.890633       1 configmap.go:53] updating configmap openshift-monitoring/serving-certs-ca-bundle with the service signing CA bundle
./pods/openshift-service-ca_service-ca-57cf89d54d-z2w4r_service-ca-controller.log:I0402 16:36:08.170876       1 configmap.go:53] updating configmap openshift-monitoring/telemeter-client-serving-certs-ca-bundle with the service signing CA bundle

Comment 2 Venkata Siva Teja Areti 2020-04-03 18:45:51 UTC

Feel free to re-assign this if you think this is related to either monitoring or networking. It could as well be a one off error

Comment 3 Standa Laznicka 2020-04-06 08:13:45 UTC

This looks like an issue observed some time ago. There is a race in library-go, closing as a duplicate.

*** This bug has been marked as a duplicate of bug 1779438 ***

Note You need to log in before you can comment on or make changes to this bug.