Description of problem: A customer can observe a high number of requests to the proxy when the endpoint is defined in the noProxy settings. The function checkProxyConfig should send a request once each 5 minutes. instead it's claimed that the oauth route: oauth-openshift.apps.<domain>:443 is going through the proxy 4K times an hour. Version-Release number of selected component (if applicable): 4.8.5 I will give more details in private notes.
Iām adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.
Hi, I am starting to work on this bug. Our team has some shortcomings in capacity. Thank you for your patience.
Hey, It would be lovely, if I could get a pcap alongside a must gather and any additional information that could help me to understand the who-is-who. So from the pcap we see that a client is asking an intermediate server to proxy a connection, hence proxy. First they establish the TCP connection and then there is the HTTP CONNECT request from the client. The proxy agrees and tells that it established the connection to the target location, which is oauth-openshift.apps domain. But once the client starts the TLS handshake, which the proxy should just proxy, the proxy kills the connection with a reset. This dance happens roughly once a second. It is hard for me to identify who is calling whom and why the whom is killing the connection, without getting the appropriate logs. I checked the `checkProxyConfig` in cluster-authentication-operator and it could've been the root issue, but from the must-gather logs that are attached, there is no logging of that behavior (error messages that indicates error on connection and retries). So either the must gather is not having the same issue as the pcap or it is not `checkProxyConfig` causing all that communication. I am also not completely sure if it is really an auth problem. Still looking forward to. It is quite interesting.
Hi Krzysztof, thanks a lot for taking care of this bug. I will try to ask a new set of data ( pcap + must gather with oauth pod operator logs ) to the customer.
There is a related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111670 It seems like the router is making "health probe checks" by opening connections and then resetting them. The proxy in front of the app seems to log them. Usually it seems to happen in a 5sec-interval, but it can happen for every pod behind a route, leading to more than one request per 5sec-interval. There are currently two solutions: - increase interval as described here: https://docs.openshift.com/container-platform/4.10/networking/routes/route-configuration.html#nw-route-specific-annotations_route-configuration - tell the proxy to not log empty requests. The later would need to be added to the upstream proxy as a feature and then handed down to our downstream fork. This might take some time. It this helpful @gparente? Kudos to @mmasters, who described that behavior in one of the chats.
The above is the current working assumption.
Ok, so I am closing this bug as of now. Thanks all for your support.