Description of problem: VMSnapshot breaks after OCP Upgrade 4.10.z -> 4.11.z Version-Release number of selected component (if applicable): We saw this: OCP Upgrade 4.10.47 -> 4.11.22, CNV v4.10.8-8 OCP Upgrade 4.10.45 -> 4.11.20, CNV v4.10.8-2 OCP Upgrade 4.10.44 -> 4.11.18, CNV v4.10.7-24 How reproducible: Only on OCP upgrade, not on CNV upgrade Steps to Reproduce: 1. Create a VMSnapshot before the upgrade 2. Upgrade OCP 3. Try to create a VMRestore from the VMSnapshot that was created before the upgrade Actual results: "status":"Failure","message":"admission webhook \\"virtualmachinerestore-validator.snapshot.kubevirt.io\\" denied the request: VirtualMachineSnapshot \\"snapshot-vm-snapshot-upgrade-b-1673173618-7427866\\" is not ready to use","reason":"Invalid","details":{"causes":[{"reason":"FieldValueInvalid","message":"VirtualMachineSnapshot \\"snapshot-vm-snapshot-upgrade-b-1673173618-7427866\\" is not ready to use","field":"spec.virtualMachineSnapshotName"}]},"code":422}\n' VMSnapshot status: 'status': {'conditions': [{'lastProbeTime': None, 'lastTransitionTime': '2023-01-08T10:27:54Z', 'reason': 'Operation complete', 'status': 'False', 'type': 'Progressing'}, {'lastProbeTime': None, 'lastTransitionTime': '2023-01-08T11:29:53Z', 'reason': 'Error', 'status': 'False', 'type': 'Ready'}], 'creationTime': '2023-01-08T10:27:54Z', 'error': {'message': 'VolumeSnapshots (vmsnapshot-722e79ce-26a5-47b9-ac37-002ee08afe2c-volume-dv-disk) skipped because in error state', 'time': '2023-01-08T11:57:39Z'}, 'phase': 'Succeeded', 'readyToUse': False, 'sourceUID': '43f28874-5226-4a1d-bf26-efb29a886242', 'virtualMachineSnapshotContentName': 'vmsnapshot-content-722e79ce-26a5-47b9-ac37-002ee08afe2c'}} VM status after upgrade: vm-snapshot-upgrade-b-1673173618-7427866.yaml volumeSnapshotStatuses: - enabled: false name: dv-disk reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this StorageClass [ocs-storagecluster-ceph-rbd] [dv-disk]' - enabled: false name: cloudinitdisk reason: Snapshot is not supported for this volumeSource type [cloudinitdisk] Expected results: VMRestore Completed Additional info: Full VM yaml after upgrade: ResourceInstance[VirtualMachine]: apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: annotations: kubemacpool.io/transaction-timestamp: '2023-01-08T10:28:13.574614151Z' kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 creationTimestamp: '2023-01-08T10:26:58Z' generation: 5 managedFields: - apiVersion: kubevirt.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:dataVolumeTemplates: {} f:template: .: {} f:metadata: .: {} f:labels: .: {} f:debugLogs: {} f:kubevirt.io/domain: {} f:kubevirt.io/vm: {} f:spec: .: {} f:domain: .: {} f:devices: .: {} f:disks: {} f:rng: {} f:resources: .: {} f:requests: .: {} f:memory: {} f:volumes: {} manager: OpenAPI-Generator operation: Update time: '2023-01-08T10:26:58Z' - apiVersion: kubevirt.io/v1alpha3 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: {} f:printableStatus: {} f:volumeSnapshotStatuses: {} manager: Go-http-client operation: Update subresource: status time: '2023-01-08T11:29:53Z' name: vm-snapshot-upgrade-b-1673173618-7427866 namespace: test-upgrade-namespace resourceVersion: '293697' uid: 43f28874-5226-4a1d-bf26-efb29a886242 spec: dataVolumeTemplates: - metadata: creationTimestamp: null name: dv-snapshot-upgrade-b namespace: test-upgrade-namespace spec: contentType: kubevirt pvc: accessModes: - ReadWriteMany resources: requests: storage: 3Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block source: http: url: <cirros-0.4.0-x86_64-disk.qcow2> running: false template: metadata: creationTimestamp: null labels: debugLogs: 'true' kubevirt.io/domain: vm-snapshot-upgrade-b-1673173618-7427866 kubevirt.io/vm: vm-snapshot-upgrade-b-1673173618-7427866 spec: domain: devices: disks: - disk: bus: virtio name: dv-disk - disk: bus: virtio name: cloudinitdisk rng: {} machine: type: pc-q35-rhel8.4.0 resources: requests: memory: 64M volumes: - dataVolume: name: dv-snapshot-upgrade-b name: dv-disk - cloudInitNoCloud: userData: "#cloud-config\nchpasswd:\n expire: false\npassword: password\n\ user: fedora\n\nssh_authorized_keys:\n [ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT\ \ root]\nruncmd: ['grep ssh-rsa /etc/crypto-policies/back-ends/opensshserver.config\ \ || sudo update-crypto-policies --set LEGACY || true', \"sudo sed -i\ \ 's/^#\\\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config\"\ , 'sudo systemctl enable sshd', 'sudo systemctl restart sshd']" name: cloudinitdisk status: conditions: - lastProbeTime: '2023-01-08T10:28:26Z' lastTransitionTime: '2023-01-08T10:28:26Z' message: VMI does not exist reason: VMINotExists status: 'False' type: Ready - lastProbeTime: null lastTransitionTime: null status: 'True' type: LiveMigratable printableStatus: Stopped volumeSnapshotStatuses: - enabled: false name: dv-disk reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this StorageClass [ocs-storagecluster-ceph-rbd] [dv-disk]' - enabled: false name: cloudinitdisk reason: Snapshot is not supported for this volumeSource type [cloudinitdisk]
There's no VolumeSnapshotSlass for storage class 'ocs-storagecluster-ceph-rbd' with provisioner 'openshift-storage.rbd.csi.ceph.com' (but there was before the upgrade) [cnv-qe-jenkins@c01-ocp411-upg-bl777-executor ~]$ oc get volumesnapshotclass NAME DRIVER DELETIONPOLICY AGE ocs-storagecluster-cephfsplugin-snapclass openshift-storage.cephfs.csi.ceph.com Delete 30h ocs-storagecluster-rbdplugin-snapclass openshift-storage.rbd.csi.ceph.com Delete 30h standard-csi cinder.csi.openstack.org Delete 31h [cnv-qe-jenkins@c01-ocp411-upg-bl777-executor ~]$ oc get sc | grep ocs local-block-ocs kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 30h ocs-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 30h ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 30h
I mean THERE IS a VolumeSnapshotClass, please don't mind my previous comment :)
Shelly, could you please take a look?
The bug happens when upgrading OCP to 4.11 without upgrading CNV also. In such case the informers in CNV are expected to be in v1beta1 version while the OCP upgrade updated them to be v1. This discrepancy causes the informers to not be enabled in the virt-controller which makes the vm snapshot appear as not ready and the vmsnapshotstatus to show no available volumesnapshotclass the snapshot itself is still valid and upgrading the CNV to 4.11 should solve the issue.
This issue can be solved by upgrading CNV to an already available version. Therefore, there is nothing more we can do except recommend this upgrade. OCP and OCP-Virt Y-stream versions should always be matching anyway.
Verified that upgrading the CNV will solve the issue. 1. Created a VM and a VMSnapshot 2. Upgraded the OCP 4.10 -> 4.11 3. Upgraded the CNV 4.10 -> 4.11 4. The VMSnapshot became ReadyToUse: true 5. Created the VirtualMachineRestore, it became Completed, and the restored VM was Running.