Bug 1873299
Summary: | Storage operator stops reconciling when going Upgradeable=False on v1alpha1 CRDs | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> | |
Component: | Storage | Assignee: | Christian Huffman <chuffman> | |
Storage sub component: | Operators | QA Contact: | Wei Duan <wduan> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | unspecified | CC: | aos-bugs, chuffman, lmohanty, sdodson | |
Version: | 4.3.z | Keywords: | Upgrades | |
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: If v1alpha1 VolumeSnapshot CRDs were detected, no further reconcile actions were taken.
Consequence: The Cluster Storage Operator could not perform z-stream upgrades if these CRDs were ever detected on the cluster.
Fix: Moved the v1alpha1 CRD check to later in the Reconcile loop.
Result: Z-stream upgrades now complete successfully, and v1alpha1 CRDs are detected without issue.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1874873 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 16:35:33 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1874873 |
Description
W. Trevor King
2020-08-27 19:11:33 UTC
We disabled the snapshot co, then uninstalled the v1beta1 VolumeSnapshot* CRDs and installed the v1alpha1 VolumeSnapshot* CRDs. But the upgrade did not start. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-25-204643 True False 27h Cluster version is 4.6.0-0.nightly-2020-08-25-204643 For clusterversion version: "spec": { "channel": "stable-4.6", "clusterID": "35c2b12d-7440-4c99-ac50-60f3aff0a059", "desiredUpdate": { "force": false, "image": "registry.svc.ci.openshift.org/ocp/release@sha256:c4059816df4d67ff5dc2356dd4d278833d1e57282e16b9fef59554936e5562ed", "version": "4.6.0-0.nightly-2020-08-25-234625" }, "upstream": "https://openshift-release.svc.ci.openshift.org/graph" }, "status": { "availableUpdates": [ { "image": "registry.svc.ci.openshift.org/ocp/release@sha256:c4059816df4d67ff5dc2356dd4d278833d1e57282e16b9fef59554936e5562ed", "version": "4.6.0-0.nightly-2020-08-25-234625" } ], "conditions": [ { "lastTransitionTime": "2020-08-28T03:33:44Z", "message": "Done applying 4.6.0-0.nightly-2020-08-25-204643", "status": "True", "type": "Available" }, { "lastTransitionTime": "2020-08-28T05:57:35Z", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2020-08-28T05:59:20Z", "message": "Cluster version is 4.6.0-0.nightly-2020-08-25-204643", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2020-08-28T14:06:11Z", "status": "True", "type": "RetrievedUpdates" }, { "lastTransitionTime": "2020-08-28T06:11:26Z", "message": "Cluster operator storage cannot be upgraded between minor versions: SnapshotCRDControllerUpgradeable: Unable to update cluster as v1alpha1 version of VolumeSnapshot, VolumeSnapshotContent is detected. Remove these CRDs to allow the upgrade to proceed.", "reason": "SnapshotCRDController_AlphaDetected", "status": "False", "type": "Upgradeable" } ], From the status, we only see the SnapshotCRDController_AlphaDetected made the Upgradeable False. I tried to remove the v1alpha1 VolumeSnapshot* CRDs but only VolumeSnapshotclass CRD removed, VolumeSnapshot and VolumeSnapshotContent CRD cannot be deleted. Still working on it. But anyway, from my opinion, we also hit this issue on 4.6. Looks like my previous conclusion is not correct. After managed to delete all the v1alpha1 VolumeSnapshot* CRDs, upgrade still did not start. I pastested the clusterversion here, maybe need check with upgrade team. $ oc get clusterversion version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-25-204643 True False 28h Cluster version is 4.6.0-0.nightly-2020-08-25-204643 [wduan@MINT kubernetes-1.16]$ oc get clusterversion version -o yaml apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: "2020-08-28T03:00:49Z" generation: 18 managedFields: - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:channel: {} f:clusterID: {} manager: cluster-bootstrap operation: Update time: "2020-08-28T03:00:49Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:desiredUpdate: .: {} f:force: {} f:image: {} f:version: {} manager: oc operation: Update time: "2020-08-29T05:04:46Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:upstream: {} manager: kubectl-patch operation: Update time: "2020-08-29T08:43:35Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:availableUpdates: {} f:conditions: {} f:desired: .: {} f:image: {} f:version: {} f:history: {} f:observedGeneration: {} f:versionHash: {} manager: cluster-version-operator operation: Update time: "2020-08-29T09:45:06Z" name: version resourceVersion: "1325193" selfLink: /apis/config.openshift.io/v1/clusterversions/version uid: 80d09500-7c10-46a0-aa74-8cbd2edec283 spec: channel: stable-4.6 clusterID: 35c2b12d-7440-4c99-ac50-60f3aff0a059 desiredUpdate: force: false image: registry.svc.ci.openshift.org/ocp/release@sha256:c4059816df4d67ff5dc2356dd4d278833d1e57282e16b9fef59554936e5562ed version: 4.6.0-0.nightly-2020-08-25-234625 upstream: https://openshift-release.svc.ci.openshift.org/graph status: availableUpdates: - image: registry.svc.ci.openshift.org/ocp/release@sha256:c4059816df4d67ff5dc2356dd4d278833d1e57282e16b9fef59554936e5562ed version: 4.6.0-0.nightly-2020-08-25-234625 conditions: - lastTransitionTime: "2020-08-28T03:33:44Z" message: Done applying 4.6.0-0.nightly-2020-08-25-204643 status: "True" type: Available - lastTransitionTime: "2020-08-28T05:57:35Z" status: "False" type: Failing - lastTransitionTime: "2020-08-28T05:59:20Z" message: Cluster version is 4.6.0-0.nightly-2020-08-25-204643 status: "False" type: Progressing - lastTransitionTime: "2020-08-28T14:06:11Z" status: "True" type: RetrievedUpdates desired: image: registry.svc.ci.openshift.org/ocp/release@sha256:56945dc7218d758e25ffe990374668890e8c77d72132c98e0cc8f6272c063cc7 version: 4.6.0-0.nightly-2020-08-25-204643 history: - completionTime: "2020-08-28T03:33:44Z" image: registry.svc.ci.openshift.org/ocp/release@sha256:56945dc7218d758e25ffe990374668890e8c77d72132c98e0cc8f6272c063cc7 startedTime: "2020-08-28T03:00:49Z" state: Completed verified: false version: 4.6.0-0.nightly-2020-08-25-204643 observedGeneration: 7 versionHash: VdMtCylIGgw= I uploaded the must-gather on http://virt-openshift-05.lab.eng.nay.redhat.com/wduan/logs/must-gather.local.8306168597559322770_0829.tar.gz > ...upgrade still did not start I've spun this off into bug 1873900. It seems orthogonal to this bug's storage issue. I tried another upgrade, this time upgrade was triggered but blocked with csi-snapshot-controller co. Which means this will still block the upgrade. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-26-202109 True True 47m Unable to apply 4.6.0-0.nightly-2020-08-27-005538: the cluster operator csi-snapshot-controller has not yet successfully rolled out $ oc get co csi-snapshot-controller -ojson | jq .status.conditions [ { "lastTransitionTime": "2020-08-31T08:14:41Z", "message": "Degraded: failed to sync CRDs: cluster-csi-snapshot-controller-operator does not support v1alpha1 version of snapshot CRDs volumesnapshots.snapshot.storage.k8s.io, volumesnapshotcontents.snapshot.storage.k8s.io, volumesnapshotclasses.snapshot.storage.k8s.io installed by user or 3rd party controller", "reason": "_AlphaCRDsExist", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2020-08-31T00:50:50Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2020-08-31T00:50:50Z", "reason": "AsExpected", "status": "True", "type": "Available" }, { "lastTransitionTime": "2020-08-31T00:44:49Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] > In both cases, the 'storage' co was successfully upgraded; however, the 'csi-snapshot-controller‘ was not blocked due to the presence of the v1alpha1 CRDs.
I think that means "this storage-operator bug can be VERIFIED on 4.6, and we may need a new bug for the snapshot controller".
First, correct my typo and fortunately it did not mislead you. > In both cases, the 'storage' co was successfully upgraded; however, the 'csi-snapshot-controller‘ was not blocked due to the presence of the v1alpha1 CRDs. Should be In both cases, the 'storage' co was successfully upgraded; however, the 'csi-snapshot-controller‘ was blocked due to the presence of the v1alpha1 CRDs. @Huffman, as I asked in comment10, I'd like to confirm if 'storage-operator' upgrade could enough for "VERIFIED", also we tried several scenarios for 'csi-snapshot-controller‘ to see what happened. Actually in case B, which crd 'csisnapshotcontrollers' and 'csi-snapshot-controller' co was deleted, is this case more similar with the upgrade from 4.3 to 4.4? > In other words, I think we need to ensure that this doesn't block upgrades in the storage operator in 4.3, as the CSI Snapshot Controller Operator shouldn't ever encounter this state (being present alongside the v1alpha1 CRDs). I agree with your concerned, it's ok for me to "VERIFIED" this BZ and switch to test/verified on upgrade from 4.3 -> 4.3 and 4.3 -> 4.4. Thanks a lot for the explanation, Mark it as VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |