Bug 2111670

Summary:	The kube-rbac-proxy-federate container reporting TLS handshake error
Product:	OpenShift Container Platform	Reporter:	samy <szemmour>
Component:	apiserver-auth	Assignee:	Krzysztof Ostrowski <kostrows>
Status:	CLOSED WONTFIX	QA Contact:	Xingxing Xia <xxia>
Severity:	low	Docs Contact:
Priority:	medium
Version:	4.10	CC:	anpicker, kostrows, mfojtik, spasquie, surbania, yuokada
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-01-17 11:44:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description samy 2022-07-27 19:14:29 UTC

Description of problem:
The kube-rbac-proxy-federate container inside the prometheus-user-workload-0 pod deployed part of the openshift-user-workload-monitoring namespace throw  TLS handshake error.

Version-Release number of selected component (if applicable):
openshift 4.10.22

How reproducible:

Steps to Reproduce:
1. Deploy openshift 4.10.22 cluster
2. Enable monitoring for user-defined projects 
3. check logs in the kube-rbac-proxy-federate container: 

oc logs prometheus-user-workload-0 -n openshift-user-workload-monitoring -c kube-rbac-proxy-federate

Actual results:
022/07/27 17:06:53 http: TLS handshake error from 10.131.0.3:58912: write tcp 10.129.2.15:9092->10.131.0.3:58912: write: connection reset by peer
2022/07/27 17:06:53 http: TLS handshake error from 10.128.2.10:33522: write tcp 10.129.2.15:9092->10.128.2.10:33522: write: connection reset by peer
2022/07/27 17:06:58 http: TLS handshake error from 10.131.0.3:59010: write tcp 10.129.2.15:9092->10.131.0.3:59010: write: connection reset by peer
2022/07/27 17:06:58 http: TLS handshake error from 10.128.2.10:33596: write tcp 10.129.2.15:9092->10.128.2.10:33596: write: connection reset by peer
2022/07/27 17:07:03 http: TLS handshake error from 10.131.0.3:59084: write tcp 10.129.2.15:9092->10.131.0.3:59084: write: connection reset by peer

Expected results:
No TLS handshake error

Additional info:
This issue  happen even if no application is configured to expose metrics

Comment 1 Simon Pasquier 2022-07-28 12:11:18 UTC

IIUC these logs are triggered by the router health checks: the router will open a TCP connection to the proxy's port and close right away without proceeding with the TLS handshake.
We need to validate the assumption with the auth team that maintains kube-rbac-proxy and investigate whether we have ways to tweak the health checks performed by the router.

Comment 2 Juan Rodriguez 2022-08-02 09:36:19 UTC

https://coreos.slack.com/archives/CCH60A77E/p1646915604267019 describes the same issue

Comment 3 Juan Rodriguez 2022-08-03 12:12:28 UTC

As discussed with the network edge team in https://coreos.slack.com/archives/CCH60A77E/p1659372630615569 and https://coreos.slack.com/archives/CCH60A77E/p1646915604267019 this is an expected behavior due to the health check probes from a Route closing the TCP connection without performing a TLS handshake. But those messages are very frequent and might lead to customer questions, so we'd prefer to avoid then if possible nevertheless. 

Filtering out those log messages would be done in kube-rbac-proxy side. Per discussion with Auth team in https://coreos.slack.com/archives/CB48XQ4KZ/p1659444053063079 BZ https://bugzilla.redhat.com/show_bug.cgi?id=2030726 might show a similar issue. So I'm reassign to  Auth team

Comment 6 Krzysztof Ostrowski 2022-08-24 14:08:26 UTC

So as it seems, this is not a bug. It looks like a feature :)
So the haproxy makes health checks by opening tcp connections and then resetting it.
We can contribute a "don't report empty connections"-flag to kube-rbac-proxy to get rid of the logging.
To reduce the logging, we can reduce the interval of health checks by the routes (haproxy): https://docs.openshift.com/container-platform/4.10/networking/routes/route-configuration.html#nw-route-specific-annotations_route-configuration.

Do we need to keep this BZ open to track the progress on kube-rbac-proxy or can we close it?

Comment 7 Krzysztof Ostrowski 2022-09-01 07:26:41 UTC

I will close it as of now.
If there is a request to keep this bug open, until:

1. the feature has been added to kube-rbac-proxy and
2. the feature is used within our components,

then please reopen the BZ.

Comment 9 samy 2022-09-23 14:01:45 UTC

Hello,

Is  it possible to  keep this case open?

I can see that this BZ  is considered more as an RFE  based on comment #6 so  is  there  any RFE  opened  to reflect  that ?

Comment 10 Krzysztof Ostrowski 2022-10-04 07:42:03 UTC

I created an issue upstream for that: https://github.com/brancz/kube-rbac-proxy/issues/194

Comment 11 Michal Fojtik 2023-01-16 14:43:05 UTC

Dear reporter, we greatly appreciate the bug you have reported here. Unfortunately, due to migration to a new issue-tracking system (https://issues.redhat.com/), we cannot continue triaging bugs reported in Bugzilla. Since this bug has been stale for multiple days, we, therefore, decided to close this bug.
If you think this is a mistake or this bug has a higher priority or severity as set today, please feel free to reopen this bug and tell us why. We are going to move every re-opened bug to https://issues.redhat.com. 

Thank you for your patience and understanding.