Description of problem:
Logging into the console results in constant redirection to page showing available apis.
The preceding page content results from a 404 during the auth flow:
Request URL: https://console-openshift-console.apps.rhcos-cr.rhcos.sandbox.openshift.com/auth/callback?code=jS2....p4qc&state=90db818c
Request Method: GET
Status Code: 404
Remote Address: 18.104.22.168:443
Version-Release number of selected component (if applicable):
100% on two recent clusters
I now have reason to believe the underlying problem is https://bugzilla.redhat.com/show_bug.cgi?id=1688390 — need to do a bit more diagnosis to confirm.
Created https://bugzilla.redhat.com/show_bug.cgi?id=1690146 to track it in 4.0
Regarding https://bugzilla.redhat.com/show_bug.cgi?id=1688390 and https://bugzilla.redhat.com/show_bug.cgi?id=1690146 — yesterday I was able to reproduce the console problem in a cluster which was NOT exhibiting the suspicious haproxy process issue, so the problems may not be related after all.
(In reply to Dan Mace from comment #5)
> Regarding https://bugzilla.redhat.com/show_bug.cgi?id=1688390 and
> https://bugzilla.redhat.com/show_bug.cgi?id=1690146 — yesterday I was able
> to reproduce the console problem in a cluster which was NOT exhibiting the
> suspicious haproxy process issue, so the problems may not be related after
Hi Dan, I tried but didn't reproduce this issue with several nightly builds recently.
Could you please provide the detail steps if you can reproduce.
We're only seeing this on certain clusters. We don't have a specific steps to reproduce unfortunately.
We've identified the root issue as edge case in our passthrough route handling of HTTP2 endpoints. Given the following conditions:
* A wildcard ingress certificate (eg. *.apps.openshift.example.com)
* A DNS wildcard (eg. *.apps.openshift.example.com) with A records resolving to a static set of ingress load balancer IPs
* A passthrough route to an HTTP2 server (eg. auth.apps.openshift.example.com)
* A an edge or reencrypted route to a server in the same subdomain (eg. console.apps.openshift.example.com)
* An HTTP2 client which coalesces connections (eg. Chrome/Firefox)
It's possible for packets destined for a proxy-terminated route (eg. console) to be misdirected to the passthrough/HTTP2 route (eg. auth).
In brief, a connection to the passthrough/HTTP2 server may be reused by the client for packets destined for other servers for which the wildcard certificate is valid. Because both route host names are valid for the certificate and resolve to the same IPs through DNS, an existing HTTP2 server connection is considered reusable for the other servers' packets. However, because the HTTP2 connection at the proxy is coupled to the HTTP2 server through the initial SNI header from a TLS handshake, and the packets coming through are opaque and cannot be disambiguated by the proxy, the packets cannot be identified by the proxy as misdirected and are all forwarded to the HTTP2 server.
Solutions could be:
1. Discontinue use of HTTP2 for these services
2. Implement mutual TLS in the ingress controller to enable terminating TLS at the proxy for the auth server
3. Implement HTTP 421 misdirected request support at the auth server to hint clients to stop reusing the connection for the request authority
Our current recommendation in the short term is (3), and longer term we would like to implement mTLS (2) for ease of use.
Another viable solution Paul Weil presented is to use a separate ingresscontroller, domain, and cert for auth.
In the meantime, users can probably work around the issue by waiting for connections to expire (eg. ~30s) and reloading the console page (or using a new browser session).
https://github.com/openshift/origin/pull/22529 is merged
No such issue found for nightly build, move to verified, please re-open when met such problem again.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.1.0-0.nightly-2019-04-18-210657 True False 170m Cluster version is 4.1.0-0.nightly-2019-04-18-210657
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.