Bug 1854857
Summary: | APIServerServiceUnavailableErrorjava error makes ImageChangesInProgress keeping true that blocked the upgrade processed | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | XiuJuan Wang <xiuwang> |
Component: | Samples | Assignee: | Gabe Montero <gmontero> |
Status: | CLOSED ERRATA | QA Contact: | XiuJuan Wang <xiuwang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: intermittent API server errors were reported on the wrong condition (ImageChangesInProgress instead of SamplesExists) of the cluster operator config object.
Consequence: when API server communication returned and all the samples were installed, the samples operator would fail to switch Progressing to false because there was unexpected data in its ImageChangesInProgress condition, and upgrades would incorrectly be marked as incomplete.
Fix: code change was made to update SamplesExists with error reports on APIServer communication
Result: upgrades are no longer blocked if intermittent APIServer errors occur while samples operator is upgrading.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:12:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1857201 |
Description
XiuJuan Wang
2020-07-08 11:26:07 UTC
@XiuJuan I understand the logs are no longer available. But since you reported this: message: 'error creating samples: the server is currently unable to handle the request (put imagestreams.image.openshift.io jboss-fuse70-eap-openshift)' reason: 'APIServerServiceUnavailableErrorjava ' status: "True" type: ImageChangesInProgress I suspect you had access to the entire `oc get configs.samples cluster -o yaml` output. Any chance you still have the entire object's yaml, and not just that subset of the entire yaml? In the interim I'm still trying to reverse engineer how we got into this state, but the entire yaml could prove helpful. I think I have a simple fix for this (that intermittent APIServer error on the initial create should go to SamplesExists, not ImageChangesInProgress), but I'd like to examine the full sample operator config yaml if @XiuJuan has it. I've looked at this enough now I don't need the additional fields from the config obj yaml. Getting actual API server errors is pretty rare, so we uncovered an issue that has been there for a while. Setting the error report for samples create on the image in ImageChangesInProgress condition messes up the transition of ImageChangesInProgress from true to false, because a non-imagestream name is in the reason field. Will have fix up shortly. Gabe, thanks The logs in comment #0 are all I get before the cluster is removed. Glad you have got clue. while the likelihood of this happening is remote, as it will block an upgrade, I'm bumping the severity Upgrade several cluster from 4.4-> 4.5-> 4.6 openshift-samples don't meet the bug issue even openshift-apiserver is degraded. openshift-apiserver 4.6.0-0.nightly-2020-07-14-224428 False False True 3h21m openshift-controller-manager 4.6.0-0.nightly-2020-07-14-224428 True True False 4h22m openshift-samples 4.6.0-0.nightly-2020-07-14-224428 True False False 3h29m Then I deleted apiserver pods during openshift-samples processing. The apiserver error APIServerConflictError moved to SamplesExist. Hence mark this bug as verified. $oc get configs.samples -o yaml apiVersion: v1 items: - apiVersion: samples.operator.openshift.io/v1 kind: Config metadata: creationTimestamp: "2020-07-15T02:53:08Z" finalizers: - samples.operator.openshift.io/finalizer generation: 3 name: cluster resourceVersion: "253677" selfLink: /apis/samples.operator.openshift.io/v1/configs/cluster uid: f456399d-7938-412a-ba66-d40b3a69cc41 spec: architectures: - x86_64 managementState: Managed status: architectures: - x86_64 conditions: - lastTransitionTime: "2020-07-15T02:53:08Z" lastUpdateTime: "2020-07-15T02:53:08Z" status: "True" type: ImportCredentialsExist - lastTransitionTime: "2020-07-15T02:53:13Z" lastUpdateTime: "2020-07-15T02:53:13Z" status: "True" type: ConfigurationValid - lastTransitionTime: "2020-07-15T08:33:00Z" lastUpdateTime: "2020-07-15T08:35:44Z" status: "False" type: ImportImageErrorsExist - lastTransitionTime: "2020-07-15T08:34:44Z" lastUpdateTime: "2020-07-15T08:34:44Z" status: "False" type: ImageChangesInProgress - lastTransitionTime: "2020-07-15T08:34:47Z" lastUpdateTime: "2020-07-15T08:35:44Z" message: 'error creating samples: Operation cannot be fulfilled on imagestreams.image.openshift.io "jboss-datagrid73-openshift": the object has been modified; please apply your changes to the latest version and try again' reason: APIServerConflictError status: Unknown type: SamplesExist - lastTransitionTime: "2020-07-15T08:33:03Z" lastUpdateTime: "2020-07-15T08:33:03Z" status: "False" type: RemovePending - lastTransitionTime: "2020-07-15T04:50:00Z" lastUpdateTime: "2020-07-15T04:50:00Z" status: "False" type: MigrationInProgress managementState: Managed version: 4.6.0-0.nightly-2020-07-14-224428 kind: List metadata: Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |