Bug 1835869

Summary: Upgrade from OCP 4.3.z to 4.4.0 with v1alpha1 snapshot CRDs fails.
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Christian Huffman <chuffman>
Storage sub component: Operators QA Contact: Chao Yang <chaoyang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, chaoyang, lmohanty, mfuruta, oarribas, sdodson, wking
Version: 4.3.zKeywords: Upgrades
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: In OCP 4.4 the v1beta1 VolumeSnapshot CRDs were introduced; however, some clusters had manually installed the v1alpha1 CRDs, or had them installed by a dependent CSI Driver. The v1beta1 CRDs are not compatible with the v1alpha1 CRDs. Consequence: The upgrade from 4.3 to 4.4 was blocked, as the v1beta1 CRDs could not be installed. Fix: The Cluster Storage Operator now checks for the presence of v1alpha1 VolumeSnapshot* CRDs, and will flag the cluster as unupgradeable if these are detected. Result: Any cluster with v1alpha1 CRDs is marked as Unupgradeable, and a message is provided indicating that the v1alpha1 CRDs must be removed for the upgrade to proceed.
Story Points: ---
Clone Of:
: 1843959 (view as bug list) Environment:
Last Closed: 2020-10-27 15:59:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1843959, 1846112    

Description Jan Safranek 2020-05-14 16:13:07 UTC
Description of problem:

A customer installed upstream version of Cinder CSI driver, including volume snapshot support. It installs v1alpha1 version of snapshot CRDs, such as volumesnapshots.snapshot.storage.k8s.io.

When upgrading to 4.4.0, where OCP introduces volume snapshots as technical preview, the upgrade got stuck at "the cluster operator csi-snapshot-controller has not yet successfully rolled out".

$ oc describe co csi-snapshot-controller
Name:         csi-snapshot-controller
[...]
Status:
  Conditions:
    Last Transition Time:  2020-02-26T16:57:17Z
    Message:               Degraded: failed to sync CRDs: CustomResourceDefinition.apiextensions.k8s.io "volumesnapshots.snapshot.storage.k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions
    Reason:                _OperatorSync
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Progressing
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Available
    Last Transition Time:  2020-02-26T16:55:17Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Upgradeable
[...] 

The reason is that csi-snapshot-controller-operator wants to install v1beta1 version of the snapshot CRDs, which is not compatible with v1alpha1.

Version-Release number of selected component (if applicable):
4.3.18

How reproducible:
always

Comment 1 Jan Safranek 2020-05-14 16:17:15 UTC
We cannot remove v1alpha1 CRDs during the upgrade - it would remove all snapshot CRs, i.e. data loss.
We cannot convert v1alpha1 CRDs to v1beta1 - the CSI driver (not supported by us) understands only v1alpha1 and there is no automatic conversion to v1alpha1.

What we can do is to mark the 4.3 cluster not upgradeable. cluster-storage-operator in 4.3 can monitor presence of v1alpha1 snapshot CRDs and mark cluster-storage-operator conditions as Upgradeable=false, with some description / link why and what to do with it.

Comment 2 Chao Yang 2020-05-25 07:12:30 UTC
Summit bz https://bugzilla.redhat.com/show_bug.cgi?id=1839639 for Release-nodes doc

Comment 18 Chao Yang 2020-07-10 07:47:35 UTC
Verify passed
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-07-07-233934   True        False         5h     Cluster version is 4.6.0-0.nightly-2020-07-07-233934


   Last Transition Time:  2020-07-10T07:38:59Z
    Message:               Unable to update cluster as v1alpha1 version of volumesnapshotcontents.snapshot.storage.k8s.io, volumesnapshots.snapshot.storage.k8s.io, volumesnapshotclasses.snapshot.storage.k8s.iois detected. Remove these CRDs to allow the upgrade to proceed.
    Reason:                AsExpected
    Status:                False
    Type:                  Upgradeable

Comment 20 errata-xmlrpc 2020-10-27 15:59:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196