Bug 1882394
Summary: | CSO stuck on message "the cluster operator storage has not yet successfully rolled out" while downgrading from 4.6 -> 4.5 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Fabio Bertinatto <fbertina> |
Component: | Storage | Assignee: | Fabio Bertinatto <fbertina> |
Storage sub component: | Operators | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | aos-bugs, eparis, fbertina, jsafrane, piqin, pmali, sdodson, tsze, wduan, wking |
Version: | 4.5 | Keywords: | Reopened, TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.5.z | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1877316 | Environment: | |
Last Closed: | 2020-10-26 15:11:50 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1877316 | ||
Bug Blocks: |
Comment 3
Fabio Bertinatto
2020-09-29 07:17:24 UTC
Hi Fabio, I performed an downgrade from 4.6.0-rc.0 to 4.5.0-0.nightly-2020-10-07-231808(should contain the fix) but re-produced this problem. Storage co did not roll out: [wduan@MINT 01_general]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-rc.0 True True 40m Unable to apply 4.5.0-0.nightly-2020-10-07-231808: the cluster operator storage has not yet successfully rolled out [wduan@MINT verification-tests]$ oc get co storage NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.6.0-rc.0 True False False 3h34m From cluster-storage-operator log, ConfigMap lock looks like be deleted(?), but operator was not able to become the leader: [wduan@MINT 01_general]$ oc -n openshift-cluster-storage-operator logs pod/cluster-storage-operator-86d6fbc996-7l8z7 {"level":"info","ts":1602214988.450045,"logger":"cmd","msg":"Go Version: go1.13.4"} {"level":"info","ts":1602214988.450071,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1602214988.4500754,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0"} {"level":"info","ts":1602214988.897236,"logger":"cmd","msg":"Found ConfigMap lock without metadata.ownerReferences, deleting"} {"level":"info","ts":1602214988.950875,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1602214989.1881082,"logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":1602214990.3181906,"logger":"leader","msg":"Not the leader. Waiting."} ... {"level":"info","ts":1602226999.4777808,"logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":1602227016.1669917,"logger":"leader","msg":"Not the leader. Waiting."} cluster-storage-operator-lock CM: [wduan@MINT verification-tests]$ oc -n openshift-cluster-storage-operator get cm cluster-storage-operator-lock -o yaml apiVersion: v1 kind: ConfigMap metadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"","leaseDurationSeconds":0,"acquireTime":null,"renewTime":null,"leaderTransitions":0}' creationTimestamp: "2020-10-09T03:43:08Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:control-plane.alpha.kubernetes.io/leader: {} manager: cluster-storage-operator operation: Update time: "2020-10-09T03:43:09Z" name: cluster-storage-operator-lock namespace: openshift-cluster-storage-operator resourceVersion: "54649" selfLink: /api/v1/namespaces/openshift-cluster-storage-operator/configmaps/cluster-storage-operator-lock uid: 59894bf2-5b37-48be-9006-6cf08b427e2c CSO 4.5 deleted the ConfigMap here:
> {"level":"info","ts":1602214988.897236,"logger":"cmd","msg":"Found ConfigMap lock without metadata.ownerReferences, deleting"}
This epoch translates to: Friday, October 9, 2020 3:43:08.950 AM
However, the ConfigMap from the description was created on "2020-10-09T03:43:08Z".
This means that something else (CSO 4.6) created the ConfigMap right after it was deleted by CSO 4.5.
Apparently both CSOs were running at the same time for a short period of time. This is possible because they utilize different leader election approaches.
Hi Fabio, With the https://github.com/openshift/cluster-storage-operator/pull/96, we verified pass for upgrade/downgrade for path 4.5 -> 4.6 -> 4.5 *** Bug 1877899 has been marked as a duplicate of this bug. *** Verified pass. Perform upgarde/downgrade successfully for 4.5 <-> 4.5 and 4.5 <-> 4.6, also checked ci upgrade from 4.4 is success. So change the status to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.16 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4268 *** Bug 1877899 has been marked as a duplicate of this bug. *** |