Description of problem: Many clusters that upgraded from 4.3->4.4 reporting degraded samples-operator with "APIServerError" as condition name. I believe we should make this condition name more verbose, I propose: APIServerTimeoutError APIServerConnectionRefusedError APIServerNoRouteToHostError UnknownError (?) if we don't know what error? Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Some highlights from the discussion in https://coreos.slack.com/archives/CB48XQ4KZ/p1590074085229800 1) Michal encountered instances of https://bugzilla.redhat.com/show_bug.cgi?id=1835995 when looking at upgrades to 4.4.4 2) https://bugzilla.redhat.com/show_bug.cgi?id=1835995 is not out yet in 4.4.x (still in verified as of this posting) 3) the fact that we even cited APIServer error was part of the problem with https://bugzilla.redhat.com/show_bug.cgi?id=1835995 4) the problem stemmed around accessing content in the payload All that said, for "real" issues on API server related calls, I'll look into changes to augment the samples operator reason code along the lines Michal has articulated.
https://bugzilla.redhat.com/show_bug.cgi?id=1832344 is the other upgrade bug that was leading to what Michal saw
set priority vs. severity in my haste
Verification for this will be tricky @XiuJuan Causing a disruption to the api server while samples tries to install would be needed. Not 100% confident this will fly, but: 1) mark samples operator removed 2) scale down / kill the 3 openshift api server pods 3) mark samples operator managed ASAP after 2) and see if errors occur specific to trying to create imagestreams / templates 4) then catch the openshift-samples clusteroperator being in degraded status and see what the reason is My thinking is take a pass or two at that and see what results. Otherwise, claim due diligence and mark as verified.
I got the APIServerServiceUnavailableError in openshift-samples clusteroperator after delete three nopenshift-apiserver pods Gabe, we could mark this as verified against 4.5.0-0.nightly-2020-05-31-230932 version. Following comment #8 A. 1) mark samples operator removed 2) scale down / kill the 3 openshift api server pods 3) then catch the openshift-samples clusteroperator being in degraded status and see what the reason is The openshift-samples clusteroperator report APIServerServiceUnavailableError error when delete template or imagestream when interact with apiserver conditions: - lastTransitionTime: "2020-06-01T09:28:58Z" message: The error the server is currently unable to handle the request (delete templates.template.openshift.io jws31-tomcat7-postgresql-s2i) during openshift namespace cleanup has left the samples in an unknown state; reason: APIServerServiceUnavailableError status: "True" type: Degraded - lastTransitionTime: "2020-06-01T09:28:58Z" status: "False" type: Available - lastTransitionTime: "2020-06-01T09:28:58Z" message: 'Samples installation in error at 4.5.0-0.nightly-2020-05-31-230932: APIServerServiceUnavailableError' status: "True" type: Progressing B. 1) mark samples operator removed 2)Wait samples are removed, mark samples to Managed 3) kill the 3 openshift api server pods 4) then catch the openshift-samples clusteroperator being in degraded status and see what the reason is The openshift-samples clusteroperator report APIServerServiceUnavailableError error when create templates or imagestream when interact with apiserver status: conditions: - lastTransitionTime: "2020-06-01T09:57:05Z" message: 'error creating samples: the server is currently unable to handle the request (put imagestreams.image.openshift.io fis-karaf-openshift);imagestream update error: the server is currently unable to handle the request (put imagestreams.image.openshift.io fis-karaf-openshift);' reason: APIServerServiceUnavailableError status: "True" type: Degraded - lastTransitionTime: "2020-06-01T09:56:55Z" status: "False" type: Available - lastTransitionTime: "2020-06-01T09:56:55Z" message: 'Samples installation in error at 4.5.0-0.nightly-2020-05-31-230932: APIServerServiceUnavailableError' status: "True" type: Progressing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409