We performed an upgrade of 4.2.0 CI test cluster to 4.2.10. I was logged in to safari to Grafana, prometheus, and the console. I was logged out 3 times during the upgrade and once after the upgrade (about 15m). Expected: During an upgrade, no user sessions should be disrupted (user should remain logged in). Must-gather coming soon
I got a: {"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"1cbcdd44"} out of https://oauth-openshift.apps.build01.ci.devcluster.openshift.com/oauth/authorize slightly before 2019-12-20 18:36Z, in case that's related.
Some observations: One of the oauth-servers, most probably the one handling the login, logs: ``` osinserver.go:91] internal error: resource name may not be empty ``` That suggests we are ignoring an error we're getting from an unavailable KAS/OAS somewhere and this should be fixed, but unfortunately from this log it's really hard to tell where that might be. It also appears that no openshift-apiserver was available for certain period of time during the upgrade - from authenthication-operator logs: ``` failed with: the server is currently unable to handle the request (post oauthclients.oauth.openshift.io) ``` This would cause the session disruption, as KAS pods need to access the oauthaccesstoken resource in order to validate the tokens sent to them by the console. Indeed, the OAS-o reports 503 on "oauth.openshift.io.v1" for a brief moment, I don't think that's desired. Moving to openshift-apiserver component for now.
We were not able to reproduce this problem in extensive testing, given this was moved twice the release, I'm gonna close this. Feel free to reopen if this re-appears.