1658899 – Continuous error "TLS handshake error" in grafana-proxy container logs

Bug 1658899 - Continuous error "TLS handshake error" in grafana-proxy container logs

Summary: Continuous error "TLS handshake error" in grafana-proxy container logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-13 06:36 UTC by Junqi Zhao
Modified:	2020-05-04 11:13 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: the readiness probe of the OAuth proxy container was misconfigured. Consequence: the container's logs were flooded by error messages every 10 seconds. Fix: the readiness probe was configured with the proper settings. Result: there are no error in the logs anymore.
Clone Of:
Environment:
Last Closed:	2020-05-04 11:12:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
"TLS handshake error" in grafana-proxy container logs (36.34 KB, text/plain) 2018-12-13 06:36 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 625	0	None	closed	Bug 1658899: *: configure HTTPS readiness probe for Grafana OAuth proxy	2021-01-26 09:15:52 UTC
Red Hat Product Errata	RHBA-2020:0581	0	None	None	None	2020-05-04 11:13:13 UTC

Description Junqi Zhao 2018-12-13 06:36:15 UTC

Created attachment 1513908 [details]
"TLS handshake error" in grafana-proxy container logs

Description of problem:
This bug is cloned from https://jira.coreos.com/browse/MON-495
File it again for QE team to track the monitoring issue in Bugzilla.

Deploy cluster monitoring with new installer on AWS, there are a lot of TLS error

such as

2018/12/13 06:05:52 server.go:2753: http: TLS handshake error from 10.131.0.1:34500: EOF
2018/12/13 06:06:02 server.go:2753: http: TLS handshake error from 10.131.0.1:34540: EOF
2018/12/13 06:06:12 server.go:2753: http: TLS handshake error from 10.131.0.1:34584: EOF

 

it seems it does not affect the function

$oc -n openshift-monitoring get pod | grep grafana
grafana-58456d859d-hcmj2                       2/2       Running   0          1h

 

Checked in the node where grafana pod runs at, the state is TIME_WAIT

$netstat -anlp | grep 3000
tcp        0      0 10.131.0.1:39376        10.131.0.4:3000         TIME_WAIT   -                   
tcp        0      0 10.131.0.1:39428        10.131.0.4:3000         TIME_WAIT   -                   
tcp        0      0 10.131.0.1:39556        10.131.0.4:3000         TIME_WAIT   -                   
tcp        0      0 10.131.0.1:39596        10.131.0.4:3000         TIME_WAIT   -                   
tcp        0      0 10.131.0.1:39470        10.131.0.4:3000         TIME_WAIT   -                   
tcp        0      0 10.131.0.1:39510        10.131.0.4:3000         TIME_WAIT   - 

Version-Release number of selected component (if applicable):
docker.io/grafana/grafana:5.2.4
docker.io/openshift/oauth-proxy:v1.1.0
docker.io/openshift/prometheus-alertmanager:v0.15.2
docker.io/openshift/prometheus-node-exporter:v0.16.0
docker.io/openshift/prometheus:v2.5.0
quay.io/coreos/configmap-reload:v0.0.1
quay.io/coreos/kube-rbac-proxy:v0.4.0
quay.io/coreos/kube-state-metrics:v1.4.0
quay.io/coreos/prom-label-proxy:v0.1.0
quay.io/coreos/prometheus-config-reloader:v0.26.0
quay.io/coreos/prometheus-operator:v0.26.0
quay.io/openshift/origin-configmap-reload:v3.11
quay.io/openshift/origin-telemeter:v4.0
quay.io/surbania/k8s-prometheus-adapter-amd64:326bf3c
quay.io/openshift-release-dev/ocp-v4.0@sha256:4f94db8849ed915994678726680fc39bdb47722d3dd570af47b666b0160602e5

How reproducible:
Always

Steps to Reproduce:
1.Deploy cluster monitoring with new installer on AWS
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 W. Trevor King 2018-12-17 23:24:14 UTC

I suspect this will be fixed by https://github.com/openshift/installer/pull/924

Comment 2 Craig Robinson 2019-02-22 02:02:30 UTC

I can confirm that https://github.com/openshift/installer/pull/924 does not fix this. I have installed a cluster with that change made (I used v0.12.0), and the error is still occurring despite the ${var.cluster_name}-api-int LB group having these healthcheck params:

Protocol: HTTPS
Path: /healthz
Port: 6443
Healthy threshold: 3
Unhealthy threshold: 3
Timeout: 10
Interval: 10
Success codes: 200-399

Comment 3 Junqi Zhao 2019-02-22 03:29:06 UTC

not fixed, still see the error
# oc -n openshift-monitoring logs grafana-78765ddcc7-7n8zz -c grafana-proxy
....................................................
2019/02/21 03:20:41 server.go:2923: http: TLS handshake error from 10.128.2.1:60838: EOF
2019/02/21 03:20:51 server.go:2923: http: TLS handshake error from 10.128.2.1:60892: EOF
2019/02/21 03:21:01 server.go:2923: http: TLS handshake error from 10.128.2.1:60946: EOF
2019/02/21 03:21:11 server.go:2923: http: TLS handshake error from 10.128.2.1:32874: EOF
2019/02/21 03:21:21 server.go:2923: http: TLS handshake error from 10.128.2.1:32928: EOF
2019/02/21 03:21:31 server.go:2923: http: TLS handshake error from 10.128.2.1:32984: EOF
2019/02/21 03:21:41 server.go:2923: http: TLS handshake error from 10.128.2.1:33036: EOF
2019/02/21 03:21:51 server.go:2923: http: TLS handshake error from 10.128.2.1:33088: EOF
2019/02/21 03:22:01 server.go:2923: http: TLS handshake error from 10.128.2.1:33170: EOF
2019/02/21 03:22:11 server.go:2923: http: TLS handshake error from 10.128.2.1:33224: EOF
2019/02/21 03:22:21 server.go:2923: http: TLS handshake error from 10.128.2.1:33276: EOF
2019/02/21 03:22:31 server.go:2923: http: TLS handshake error from 10.128.2.1:33342: EOF
2019/02/21 03:22:41 server.go:2923: http: TLS handshake error from 10.128.2.1:33542: EOF
2019/02/21 03:22:51 server.go:2923: http: TLS handshake error from 10.128.2.1:33626: EOF
2019/02/21 03:23:01 server.go:2923: http: TLS handshake error from 10.128.2.1:33710: EOF
2019/02/21 03:23:11 server.go:2923: http: TLS handshake error from 10.128.2.1:33784: EOF
2019/02/21 03:23:21 server.go:2923: http: TLS handshake error from 10.128.2.1:33868: EOF
2019/02/21 03:23:31 server.go:2923: http: TLS handshake error from 10.128.2.1:33940: EOF
2019/02/21 03:23:41 server.go:2923: http: TLS handshake error from 10.128.2.1:34010: EOF
2019/02/21 03:23:51 server.go:2923: http: TLS handshake error from 10.128.2.1:34084: EOF


# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-20-194410   True        False         24h       Cluster version is 4.0.0-0.nightly-2019-02-20-194410

Comment 7 Standa Laznicka 2020-01-13 12:31:04 UTC

This was observed in 4.4.

It is still worth investigating IMO, why waste customers' storage if looking into the error might unveil more serious issues?

Comment 9 Junqi Zhao 2020-01-27 03:05:59 UTC

Tested with 4.4.0-0.nightly-2020-01-24-141203, issue is fixed
# oc -n openshift-monitoring logs grafana-bbb6fcc-qf2j4 -c grafana-proxy
2020/01/26 23:42:28 provider.go:118: Defaulting client-id to system:serviceaccount:openshift-monitoring:grafana
2020/01/26 23:42:28 provider.go:123: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
2020/01/26 23:42:28 provider.go:311: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2020/01/26 23:42:28 oauthproxy.go:200: mapping path "/" => upstream "http://localhost:3001/"
2020/01/26 23:42:28 oauthproxy.go:221: compiled skip-auth-regex => "^/metrics"
2020/01/26 23:42:28 oauthproxy.go:227: OAuthProxy configured for  Client ID: system:serviceaccount:openshift-monitoring:grafana
2020/01/26 23:42:28 oauthproxy.go:237: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2020/01/26 23:42:28 http.go:96: HTTPS: listening on [::]:3000

Comment 11 errata-xmlrpc 2020-05-04 11:12:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.