Bug 1785732

Summary: Upgrade from 4.2.0 to 4.2.10 logged the user out repeatedly on safari
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: openshift-apiserverAssignee: Standa Laznicka <slaznick>
Status: CLOSED WORKSFORME QA Contact: Xingxing Xia <xxia>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, eparis, mfojtik, slaznick, wking
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-10 14:23:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-12-20 19:34:47 UTC
We performed an upgrade of 4.2.0 CI test cluster to 4.2.10.  I was logged in to safari to Grafana, prometheus, and the console.  I was logged out 3 times during the upgrade and once after the upgrade (about 15m).

Expected:

During an upgrade, no user sessions should be disrupted (user should remain logged in).

Must-gather coming soon

Comment 1 W. Trevor King 2019-12-20 19:44:32 UTC
I got a:

{"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"1cbcdd44"}

out of https://oauth-openshift.apps.build01.ci.devcluster.openshift.com/oauth/authorize slightly before 2019-12-20 18:36Z, in case that's related.

Comment 5 Standa Laznicka 2020-01-02 12:51:51 UTC
Some observations:

One of the oauth-servers, most probably the one handling the login, logs:
```
osinserver.go:91] internal error: resource name may not be empty
```
That suggests we are ignoring an error we're getting from an unavailable KAS/OAS somewhere and this should be fixed, but unfortunately from this log it's really hard to tell where that might be.


It also appears that no openshift-apiserver was available for certain period of time during the upgrade - from authenthication-operator logs:
```
failed with: the server is currently unable to handle the request (post oauthclients.oauth.openshift.io)
```

This would cause the session disruption, as KAS pods need to access the oauthaccesstoken resource in order to validate the tokens sent to them by the console. Indeed, the OAS-o reports 503 on "oauth.openshift.io.v1" for a brief moment, I don't think that's desired.

Moving to openshift-apiserver component for now.

Comment 7 Maciej Szulik 2020-02-10 14:23:53 UTC
We were not able to reproduce this problem in extensive testing, given this was moved twice the release, I'm gonna close this. Feel free to reopen if this re-appears.