Bug 2030726

Summary:	checkProxyConfig generating high number of requests sent through the proxy endpoints.
Product:	OpenShift Container Platform	Reporter:	German Parente <gparente>
Component:	apiserver-auth	Assignee:	Krzysztof Ostrowski <kostrows>
Status:	CLOSED NOTABUG	QA Contact:	Xingxing Xia <xxia>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.8	CC:	akanekar, aos-bugs, chris.bowles, kostrows, mfojtik, surbania, tkondvil
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-08-24 14:23:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description German Parente 2021-12-09 15:19:12 UTC

Description of problem:

A customer can observe a high number of requests to the proxy when the endpoint is defined in the noProxy settings. 

The function checkProxyConfig should send a request once each 5 minutes.

instead it's claimed that the oauth route:

oauth-openshift.apps.<domain>:443

is going through the proxy 4K times an hour.


Version-Release number of selected component (if applicable): 4.8.5

I will give more details in private notes.

Comment 3 Krzysztof Ostrowski 2022-01-30 15:31:45 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 7 Krzysztof Ostrowski 2022-06-22 12:45:43 UTC

Hi,

I am starting to work on this bug. Our team has some shortcomings in capacity.

Thank you for your patience.

Comment 8 Krzysztof Ostrowski 2022-06-22 14:53:01 UTC

Hey,


It would be lovely, if I could get a pcap alongside a must gather and any additional information that could help me to understand the who-is-who.

So from the pcap we see that a client is asking an intermediate server to proxy a connection, hence proxy. First they establish the TCP connection and then there is the HTTP CONNECT request from the client. The proxy agrees and tells that it established the connection to the target location, which is oauth-openshift.apps domain. But once the client starts the TLS handshake, which the proxy should just proxy, the proxy kills the connection with a reset. This dance happens roughly once a second. 

It is hard for me to identify who is calling whom and why the whom is killing the connection, without getting the appropriate logs.

I checked the `checkProxyConfig` in cluster-authentication-operator and it could've been the root issue, but from the must-gather logs that are attached, there is no logging of that behavior (error messages that indicates error on connection and retries). So either the must gather is not having the same issue as the pcap or it is not `checkProxyConfig` causing all that communication.

I am also not completely sure if it is really an auth problem. Still looking forward to. It is quite interesting.

Comment 9 German Parente 2022-06-22 15:54:58 UTC

Hi Krzysztof,

thanks a lot for taking care of this bug. I will try to ask a new set of data ( pcap + must gather with oauth pod operator logs ) to the customer.

Comment 21 Krzysztof Ostrowski 2022-08-15 14:43:31 UTC

There is a related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111670

It seems like the router is making "health probe checks" by opening connections and then resetting them.
The proxy in front of the app seems to log them.

Usually it seems to happen in a 5sec-interval, but it  can happen for every pod behind a route, leading to more than one request per 5sec-interval.

There are currently two solutions:

- increase interval as described here: https://docs.openshift.com/container-platform/4.10/networking/routes/route-configuration.html#nw-route-specific-annotations_route-configuration

- tell the proxy to not log empty requests.

The later  would need to be added to the upstream proxy as a feature and then handed down to our downstream fork.
This might take some time.

It this helpful @gparente?

Kudos to @mmasters, who described that behavior in one of the chats.

Comment 22 Krzysztof Ostrowski 2022-08-15 14:44:22 UTC

The above is the current working assumption.

Comment 24 Krzysztof Ostrowski 2022-08-24 14:23:47 UTC

Ok, so I am closing this bug as of now. Thanks all for your support.