Bug 2159442 - VMSnapshot breaks after OCP Upgrade 4.10.z -> 4.11.z
Summary: VMSnapshot breaks after OCP Upgrade 4.10.z -> 4.11.z
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.10.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.11.3
Assignee: skagan
QA Contact: Jenia Peimer
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-09 15:46 UTC by Jenia Peimer
Modified: 2023-01-25 14:48 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: OCP is upgraded to 4.11 without upgrading OCP-Virt Consequence: OCP expects api version v1 for snapshot objects but OCP-Virt is still using version v1beta1. This causes a failure when attempting to restore a VM Snapshot Workaround (if any): Upgrade OCP-Virt to 4.11 Result: VM Snapshot Restore will work as expected
Clone Of:
Environment:
Last Closed: 2023-01-18 14:32:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-24131 0 None None None 2023-01-09 15:47:26 UTC
Red Hat Issue Tracker CNV-24584 0 None None None 2023-01-22 11:37:48 UTC

Description Jenia Peimer 2023-01-09 15:46:55 UTC
Description of problem:
VMSnapshot breaks after OCP Upgrade 4.10.z -> 4.11.z

Version-Release number of selected component (if applicable):
We saw this:
OCP Upgrade 4.10.47 -> 4.11.22, CNV v4.10.8-8
OCP Upgrade 4.10.45 -> 4.11.20, CNV v4.10.8-2
OCP Upgrade 4.10.44 -> 4.11.18, CNV v4.10.7-24

How reproducible:
Only on OCP upgrade, not on CNV upgrade

Steps to Reproduce:
1. Create a VMSnapshot before the upgrade
2. Upgrade OCP
3. Try to create a VMRestore from the VMSnapshot that was created before the upgrade

Actual results:
"status":"Failure","message":"admission webhook \\"virtualmachinerestore-validator.snapshot.kubevirt.io\\" denied the request: VirtualMachineSnapshot \\"snapshot-vm-snapshot-upgrade-b-1673173618-7427866\\" is not ready to use","reason":"Invalid","details":{"causes":[{"reason":"FieldValueInvalid","message":"VirtualMachineSnapshot \\"snapshot-vm-snapshot-upgrade-b-1673173618-7427866\\" is not ready to use","field":"spec.virtualMachineSnapshotName"}]},"code":422}\n'

VMSnapshot status:
'status': {'conditions': [{'lastProbeTime': None, 'lastTransitionTime': '2023-01-08T10:27:54Z', 'reason': 'Operation complete', 'status': 'False', 'type': 'Progressing'}, {'lastProbeTime': None, 'lastTransitionTime': '2023-01-08T11:29:53Z', 'reason': 'Error', 'status': 'False', 'type': 'Ready'}], 'creationTime': '2023-01-08T10:27:54Z', 'error': {'message': 'VolumeSnapshots (vmsnapshot-722e79ce-26a5-47b9-ac37-002ee08afe2c-volume-dv-disk) skipped because in error state', 'time': '2023-01-08T11:57:39Z'}, 'phase': 'Succeeded', 'readyToUse': False, 'sourceUID': '43f28874-5226-4a1d-bf26-efb29a886242', 'virtualMachineSnapshotContentName': 'vmsnapshot-content-722e79ce-26a5-47b9-ac37-002ee08afe2c'}}

VM status after upgrade:
vm-snapshot-upgrade-b-1673173618-7427866.yaml

    volumeSnapshotStatuses:
    - enabled: false
      name: dv-disk
      reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this StorageClass [ocs-storagecluster-ceph-rbd] [dv-disk]'
    - enabled: false
      name: cloudinitdisk
      reason: Snapshot is not supported for this volumeSource type [cloudinitdisk]

Expected results:
VMRestore Completed

Additional info:

Full VM yaml after upgrade:
ResourceInstance[VirtualMachine]:
  apiVersion: kubevirt.io/v1
  kind: VirtualMachine
  metadata:
    annotations:
      kubemacpool.io/transaction-timestamp: '2023-01-08T10:28:13.574614151Z'
      kubevirt.io/latest-observed-api-version: v1
      kubevirt.io/storage-observed-api-version: v1alpha3
    creationTimestamp: '2023-01-08T10:26:58Z'
    generation: 5
    managedFields:
    - apiVersion: kubevirt.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:dataVolumeTemplates: {}
          f:template:
            .: {}
            f:metadata:
              .: {}
              f:labels:
                .: {}
                f:debugLogs: {}
                f:kubevirt.io/domain: {}
                f:kubevirt.io/vm: {}
            f:spec:
              .: {}
              f:domain:
                .: {}
                f:devices:
                  .: {}
                  f:disks: {}
                  f:rng: {}
                f:resources:
                  .: {}
                  f:requests:
                    .: {}
                    f:memory: {}
              f:volumes: {}
      manager: OpenAPI-Generator
      operation: Update
      time: '2023-01-08T10:26:58Z'
    - apiVersion: kubevirt.io/v1alpha3
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions: {}
          f:printableStatus: {}
          f:volumeSnapshotStatuses: {}
      manager: Go-http-client
      operation: Update
      subresource: status
      time: '2023-01-08T11:29:53Z'
    name: vm-snapshot-upgrade-b-1673173618-7427866
    namespace: test-upgrade-namespace
    resourceVersion: '293697'
    uid: 43f28874-5226-4a1d-bf26-efb29a886242
  spec:
    dataVolumeTemplates:
    - metadata:
        creationTimestamp: null
        name: dv-snapshot-upgrade-b
        namespace: test-upgrade-namespace
      spec:
        contentType: kubevirt
        pvc:
          accessModes:
          - ReadWriteMany
          resources:
            requests:
              storage: 3Gi
          storageClassName: ocs-storagecluster-ceph-rbd
          volumeMode: Block
        source:
          http:
            url: <cirros-0.4.0-x86_64-disk.qcow2>
    running: false
    template:
      metadata:
        creationTimestamp: null
        labels:
          debugLogs: 'true'
          kubevirt.io/domain: vm-snapshot-upgrade-b-1673173618-7427866
          kubevirt.io/vm: vm-snapshot-upgrade-b-1673173618-7427866
      spec:
        domain:
          devices:
            disks:
            - disk:
                bus: virtio
              name: dv-disk
            - disk:
                bus: virtio
              name: cloudinitdisk
            rng: {}
          machine:
            type: pc-q35-rhel8.4.0
          resources:
            requests:
              memory: 64M
        volumes:
        - dataVolume:
            name: dv-snapshot-upgrade-b
          name: dv-disk
        - cloudInitNoCloud:
            userData: "#cloud-config\nchpasswd:\n  expire: false\npassword: password\n\
              user: fedora\n\nssh_authorized_keys:\n [ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT\
              \ root]\nruncmd: ['grep ssh-rsa /etc/crypto-policies/back-ends/opensshserver.config\
              \ || sudo update-crypto-policies --set LEGACY || true', \"sudo sed -i\
              \ 's/^#\\\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config\"\
              , 'sudo systemctl enable sshd', 'sudo systemctl restart sshd']"
          name: cloudinitdisk
  status:
    conditions:
    - lastProbeTime: '2023-01-08T10:28:26Z'
      lastTransitionTime: '2023-01-08T10:28:26Z'
      message: VMI does not exist
      reason: VMINotExists
      status: 'False'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: null
      status: 'True'
      type: LiveMigratable
    printableStatus: Stopped
    volumeSnapshotStatuses:
    - enabled: false
      name: dv-disk
      reason: 'No VolumeSnapshotClass: Volume snapshots are not configured for this
        StorageClass [ocs-storagecluster-ceph-rbd] [dv-disk]'
    - enabled: false
      name: cloudinitdisk
      reason: Snapshot is not supported for this volumeSource type [cloudinitdisk]

Comment 2 Jenia Peimer 2023-01-09 16:00:40 UTC
There's no VolumeSnapshotSlass for storage class 'ocs-storagecluster-ceph-rbd' with provisioner 'openshift-storage.rbd.csi.ceph.com' (but there was before the upgrade)

[cnv-qe-jenkins@c01-ocp411-upg-bl777-executor ~]$ oc get volumesnapshotclass
NAME                                        DRIVER                                  DELETIONPOLICY   AGE
ocs-storagecluster-cephfsplugin-snapclass   openshift-storage.cephfs.csi.ceph.com   Delete           30h
ocs-storagecluster-rbdplugin-snapclass      openshift-storage.rbd.csi.ceph.com      Delete           30h
standard-csi                                cinder.csi.openstack.org                Delete           31h

[cnv-qe-jenkins@c01-ocp411-upg-bl777-executor ~]$ oc get sc | grep ocs
local-block-ocs                kubernetes.io/no-provisioner            Delete          WaitForFirstConsumer   false                  30h
ocs-storagecluster-ceph-rbd    openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   30h
ocs-storagecluster-cephfs      openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   30h

Comment 3 Jenia Peimer 2023-01-09 16:08:01 UTC
I mean THERE IS a VolumeSnapshotClass, please don't mind my previous comment :)

Comment 4 Yan Du 2023-01-11 13:28:18 UTC
Shelly, could you please take a look?

Comment 5 skagan 2023-01-17 07:38:05 UTC
The bug happens when upgrading OCP to 4.11 without upgrading CNV also. In such case the informers in CNV are expected to be in v1beta1 version while the OCP upgrade updated them to be v1. This discrepancy causes the informers to not be enabled in the virt-controller which makes the vm snapshot appear as not ready and the vmsnapshotstatus to show no available volumesnapshotclass the snapshot itself is still valid and upgrading the CNV to 4.11 should solve the issue.

Comment 7 Adam Litke 2023-01-18 14:32:07 UTC
This issue can be solved by upgrading CNV to an already available version.  Therefore, there is nothing more we can do except recommend this upgrade.  OCP and OCP-Virt Y-stream versions should always be matching anyway.

Comment 13 Jenia Peimer 2023-01-25 09:26:30 UTC
Verified that upgrading the CNV will solve the issue.

1. Created a VM and a VMSnapshot
2. Upgraded the OCP 4.10 -> 4.11
3. Upgraded the CNV 4.10 -> 4.11
4. The VMSnapshot became ReadyToUse: true
5. Created the VirtualMachineRestore, it became Completed, and the restored VM was Running.


Note You need to log in before you can comment on or make changes to this bug.