Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1743657

Summary: ERR_TOO_MANY_RETRIES loop logging in to console
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: Management ConsoleAssignee: Samuel Padgett <spadgett>
Status: CLOSED ERRATA QA Contact: Yadan Pei <yapei>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, ccoleman, fdeutsch, jokerman, lszaszki, mfojtik, mfranczy, tjelinek, ukalifon, walemark, yapei
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1748727 (view as bug list) Environment:
Last Closed: 2019-10-16 06:36:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1748727    
Attachments:
Description Flags
before login
none
after login none

Description Mike Fiedler 2019-08-20 12:03:10 UTC
Description of problem:

@ccoleman, @lcosic and I have all seen it recently.   When hitting /auth/login for a console login Chrome gets ERR_TOO_MANY_RETRIES and gives up.

In my case, FF worked ok on same cluster and Chrome worked later, so it may be intermittent.  See 20 Aug discussion on #forum-apiserver on attempts to clear cookies, etc.


Version-Release number of selected component (if applicable):4.2.0-0.ci-2019-08-20-075613


How reproducible: Unknown


Steps to Reproduce:
1. Login to console on chrome on latest CI (20-Aug)
2.
3.

Comment 1 Lukasz Szaszkiewicz 2019-08-21 09:15:12 UTC
I couldn't reproduce the issue with a cluster in the following version:

 Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0+f931880-1318", GitCommit:"f931880eb3", GitTreeState:"clean", BuildDate:"2019-08-19T15:12:51Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
 Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+ee8999f", GitCommit:"ee8999f", GitTreeState:"clean", BuildDate:"2019-08-20T15:03:50Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
 OpenShift Version: 4.2.0-0.okd-2019-08-21-062313


Nevertheless, I decided to count the number of redirects and I noticed that right after or during login to the console the number of redirection is small (<4) but right after that, the web app starts to send lots of request to https://console-openshift-console.apps.cluster-gda.devcluster.openshift.com/api/kubernetes//apis/subresources.kubevirt.io/v1alpha3/healthz.
I think that all requests to that URL end with 301 HTTP status code (Moved Permanently). The location header of the responses points to "/api/kubernetes/apis/subresources.kubevirt.io/v1alpha3/healthz". That triggers GET requests to https://console-openshift-console.apps.cluster-gda.devcluster.openshift.com/api/kubernetes/apis/subresources.kubevirt.io/v1alpha3/healthz which end with 404 HTTP status code (Not Found). I can imagine that modern browsers (including Chrome) have a fuse that counts the number of redirections and warns if there are too many such requests.

Comment 2 Lukasz Szaszkiewicz 2019-08-21 10:43:39 UTC
I tried with:
 Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0+f931880-1318", GitCommit:"f931880eb3", GitTreeState:"clean", BuildDate:"2019-08-19T15:12:51Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
 Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+0eda05f", GitCommit:"0eda05f", GitTreeState:"clean", BuildDate:"2019-08-19T22:55:10Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
 OpenShift Version: 4.2.0-0.ci-2019-08-20-075

It behaves exactly the same as with "4.2.0-0.okd-2019-08-21-062313" version - see my previous comment.

Comment 3 Lukasz Szaszkiewicz 2019-08-21 10:44:30 UTC
Created attachment 1606455 [details]
before login

Comment 4 Lukasz Szaszkiewicz 2019-08-21 10:44:56 UTC
Created attachment 1606456 [details]
after login

Comment 5 Clayton Coleman 2019-08-21 10:45:21 UTC
I see a ton of "cancelled" requests.  This may actually be a web console bug.  I think based on the info we have now we can move this to web console.

Comment 6 Fabian Deutsch 2019-08-21 11:29:06 UTC
Tomas, it's one thing to see why the health check fails, but it also seems that the console is looking at a wrong url (leading to a redirect)

Comment 7 Fabian Deutsch 2019-08-21 11:29:39 UTC
Marcin, can you tell if the health check URL is correct?

Comment 8 Samuel Padgett 2019-08-21 11:57:29 UTC
The failed health checks are a symptom of Bug 1738292. I suspect it's a red herring, though. Here's the Chromium comment for TOO_MANY_RETRIES:

// An HTTP transaction was retried too many times due for authentication or
// invalid certificates. This may be due to a bug in the net stack that would
// otherwise infinite loop, or if the server or proxy continually requests fresh
// credentials or presents a fresh invalid certificate.
NET_ERROR(TOO_MANY_RETRIES, -375)

https://github.com/chromium/chromium/blob/966c0e95c915aba3b75eb432957cd421bac3ef86/net/base/net_error_list.h#L759-L763

Comment 9 Samuel Padgett 2019-08-21 12:06:44 UTC
Thinking more on this, it might be the problem.

I haven't been able to reproduce. A HAR with the network requests would be immensely helpful if anyone is able to reproduce.

1. Open Google Chrome
2. Open Developer Tools (Ctrl-Shift-I or Cmd-Shift-I for macOS)
3. Switch to the Network tab
4. Click the "Preserve log" checkbox
5. Reproduce the problem
6. Right click a row in the network tab and select "Save all as HAR with content"

Comment 10 Fabian Deutsch 2019-08-21 12:08:32 UTC
Samuel, bug 1738292 would indeed explain it.

Comment 12 Samuel Padgett 2019-08-21 12:27:26 UTC
This PR fixes the redirect, although it's unclear if that's the cause of the TOO_MANY_RETRIES error.

https://github.com/openshift/console/pull/2438

Comment 13 Tomas Jelinek 2019-08-22 08:53:52 UTC
*** Bug 1737423 has been marked as a duplicate of this bug. ***

Comment 16 Yadan Pei 2019-08-26 07:34:23 UTC
GET chrome-extension://fmkadmapgofadopljbjfkapdkoienihi/build/backend.js net::ERR_UNKNOWN_URL_SCHEME
(anonymous) @ VM54:7
(anonymous) @ VM54:9

And I can see this error in JS console

Comment 18 Samuel Padgett 2019-08-26 13:13:27 UTC
It looks like the JS error is from an extension you have installed. I'm not sure if that is contributing to the problem or not.

I opened https://github.com/openshift/console/pull/2485 to make sure we're not trying to logout/login more than once due to multiple concurrent requests returning unauthorized. I believe that's the basic problem.

Comment 19 Samuel Padgett 2019-08-26 19:52:53 UTC
We've fixed the unnecessary redirect and canceled logout requests. Looking at the screenshot from comment #14, I'm not convinced that we've fixed the underlying problem.

Console is redirecting to the `/authorize` endpoint on the OAuth server. The status is `(failed)` with no actual status code. I spoke with Mike, and he said he was not prompted to accept the certificate from the OAuth server. You can see it bouncing back and forth between `login` and `authorize` endpoints, which results in the TOO_MANY_RETRIES error. I believe the other failed requests are red herrings.

It seems like Chrome is blocking the request to the OAuth server, perhaps due to something in the OAuth server certificate or possibly an extension like an ad blocker? The console error referencing the extension is suspicious as well.

Assigning back to the Auth team for now. The console team has fixed the bad requests from comment #1 and comment #5, and the failing request based on the screenshots from Yadan in comment #14 is to the OAuth server.

Comment 20 Lukasz Szaszkiewicz 2019-08-28 20:31:09 UTC
I think that I finally know how to reproduce the issue and what caused it, td;dr I think that we have already fixed it with https://github.com/openshift/console/pull/2485

I think that all failed requests without status are indeed blocked/canceled by the browser and never reach the server. In general, browsers are allowed to block/cancel the request, one example would be too many concurrent connections to the same origin (browsers have limits). In our case, some requests are blocked because they are coming from an untrusted source (net::ERR_CERT_AUTHORITY_INVALID) and I suspect that the browser prefers to warn the users and get consent before proceeding.

After analysing the network traffic and the *.har file attached to this issue (fixing unnecessary redirect helped a lot as it made it more apparent) I realised that we send multiple logout request which I suspect triggers login requests which are redirected to /authorize endpoint and block by the browser (net::ERR_CERT_AUTHORITY_INVALID). The only issue with that theory was that I couldn't reproduce it. I needed more login requests. How can you increase the number of request? You could ask the browser to resend them:). So I decided to enable throttling and that was it I started seeing net::ERR_TOO_MANY_RETRIES error.


Steps to reproduce:

1. take a version before https://github.com/openshift/console/pull/2485
2. make sure you are not logged in
3. enable security warning for the site if you have already disabled them - I’m not sure if you can reproduce the issue with it (that is - accepting the cert)
4. enable throttling (Fast3G, Slow3G)
5. try to login but do not trust the certificate

You might need to repeat step 5 a few times before you run into ERR_TOO_MANY_RETRIES

WDYT, does it make sense?

Comment 22 Samuel Padgett 2019-08-29 12:33:54 UTC
Moving to modified based on comment #20

Comment 24 Yadan Pei 2019-09-03 05:52:10 UTC
I didn't meet the issue on several builds with #2485

@Mike, is it ok to verify it?

Comment 25 Mike Fiedler 2019-09-03 12:23:21 UTC
Marking this verified on most recent nightly.   I think there is still some suspicion that issues in this area remain, but we can work them under a new bz if needed.

Comment 26 errata-xmlrpc 2019-10-16 06:36:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 27 walemark 2019-12-04 04:23:49 UTC
There's a long-standing bug in Chromium regarding how links without protocols are handled. This error does not have a single solution till date because it arises due to a multitude of reasons. The ERR_UNKNOWN_URL_SCHEME error is commonly because of your browser issue . There's no application on your device which can handle that particular action. It is a Chromium bug . In Chrome version 40 and up, this bug has resurfaced, but only if you are manually entering the URL of the redirect page in the address bar. The bug in chromium is responsible, yet everytime a patch is added to solve, the error finds a new way to resurface. The issue is on the chromium issue tracker here: https://bugs.chromium.org/p/chromium/issues/detail?id=459156 More Info: http://net-informations.com/q/mis/scheme.html


Common solutions:

Prefixing your links with http:// (or https://) should resolve the issue in some cases

If Err_Unknown_Url_Scheme error occurs in mailto: or tel: links inside an iframe then you can try to add target="_blank" in your URL Scheme.