Description of problem: New "restore" PVC appears to be waiting on a CDI upload server pod to finish, but work is never sent to that pod. Version-Release number of selected component (if applicable): 4.10.0 How reproducible: Always Steps to Reproduce: 1. Take a snapshot of a (in my case Running) VM 2. Shut down VM 3. Use UI to Restore to snapshot Actual results: VM is indefinitely in pending state, no log activity in cdi uploader pod Expected results: VM is recreated with PVC referencing VolumeSnapshot, and back-end snapshot class handles restore; VM starts quickly. Additional info:
This issue should only affect DataVolumes created via network clone operations. Is a regression introduced in this PR: https://github.com/kubevirt/containerized-data-importer/pull/1922
To verify it one needs to have a DV that is created by network clone, here are example preparation steps: 1. Create A DV by cloning from one storageClass to another storageClass (the target one should be on something that supports snapshots) 2. Create a VM from this DV Then repeat steps as reported by Chandler: Steps to Reproduce: 1. Take a snapshot of a (in my case Running) VM 2. Shut down VM 3. Use UI to Restore to snapshot
Tested with the following code: ------------------------------------------- oc version Client Version: 4.10.0-202206211856.p0.g45460a5.assembly.stream-45460a5 Server Version: 4.10.20 Kubernetes Version: v1.23.5+3afdacb [cnv-qe-jenkins@localhost ~]$ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.10.3 OpenShift Virtualization 4.10.3 kubevirt-hyperconverged-operator.v4.10.2 Succeeded [cnv-qe-jenkins@localhost ~]$ Tested with the following scenario: ------------------------------------------- 1. Created source dv on nfs sc 2. Cloned source dv to ceph-rbd target dv 3. Created a VM from the target dv 4. Started the VM via the UI 5. Created a snapshot fo the running VM 6. Shut down the VM via the UI 7. Restored the snapshot via the UI 8. VM is created referencing the restored pvc 9. Started the VM but VM remains in 'Starting' status Additional information: ----------------------------------------- oc get dv NAME PHASE PROGRESS RESTARTS AGE source-default1-nfs Succeeded 100.0% 3h11m target-default-ceph-rbd1 Succeeded 100.0% 129m oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE restore-9fe8e7c7-557e-45b8-a1f4-15405294aa07-root Bound pvc-cb00bf19-af52-412a-8c25-1d2f82cf0a64 6Gi RWO ocs-storagecluster-ceph-rbd 114m source-default1-nfs Bound nfs-pv-05 5Gi RWO,RWX nfs 3h11m target-default-ceph-rbd1 Bound pvc-532bf771-0622-47f5-ab81-80ce37224d92 6Gi RWO ocs-storagecluster-ceph-rbd 129m oc get vm NAME AGE STATUS READY vm-target-default-ceph-rbd1 130m Starting False >>>>>>> oc get volumesnapshots NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE vmsnapshot-1a3acd71-dfcf-4ddf-88c7-c59c4dc6c585-volume-root true target-default-ceph-rbd1 6Gi ocs-storagecluster-rbdplugin-snapclass snapcontent-005c15d3-aa3d-44c2-b321-4db03e2bf380 127m 127m Restored VM - yaml: ---------------------- oc get vm -oyaml apiVersion: v1 items: - apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: annotations: kubemacpool.io/transaction-timestamp: "2022-07-03T13:40:30.861826914Z" kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 restore.kubevirt.io/lastRestoreUID: vm-target-default-ceph-rbd1-2022-7-3-0-restore-ugvktw-9fe8e7c7-557e-45b8-a1f4-15405294aa07 creationTimestamp: "2022-07-03T13:13:44Z" generation: 5 labels: kubevirt.io/vm: vm-my-data-volume name: vm-target-default-ceph-rbd1 namespace: default resourceVersion: "5523189" uid: 5718c66c-9a9b-486a-a272-524650b0844d spec: running: true template: metadata: creationTimestamp: null labels: kubevirt.io/vm: vm-my-data-volume spec: domain: cpu: cores: 1 model: host-model sockets: 1 threads: 1 devices: disks: - disk: bus: virtio name: root interfaces: - macAddress: 02:b7:63:00:00:00 masquerade: {} name: default features: acpi: enabled: true firmware: uuid: 024157c1-af88-594a-a0d8-d178ee87e843 machine: type: pc-q35-rhel8.4.0 resources: requests: memory: 64M networks: - name: default pod: {} volumes: - name: root persistentVolumeClaim: claimName: restore-9fe8e7c7-557e-45b8-a1f4-15405294aa07-root status: conditions: - lastProbeTime: "2022-07-03T13:40:31Z" lastTransitionTime: "2022-07-03T13:40:31Z" message: Guest VM is not reported as running reason: GuestNotRunning status: "False" type: Ready created: true printableStatus: Starting volumeSnapshotStatuses: - enabled: true name: root kind: List metadata: resourceVersion: "" selfLink: "" Restored PVC ------------------- oc get pvc restore-9fe8e7c7-557e-45b8-a1f4-15405294aa07-root -oyaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: cdi.kubevirt.io/storage.condition.running: "true" cdi.kubevirt.io/storage.condition.running.message: "" cdi.kubevirt.io/storage.condition.running.reason: Pod is running cdi.kubevirt.io/storage.pod.phase: Running cdi.kubevirt.io/storage.pod.ready: "true" cdi.kubevirt.io/storage.pod.restarts: "0" cdi.kubevirt.io/storage.uploadPodName: cdi-upload-restore-9fe8e7c7-557e-45b8-a1f4-15405294aa07-root cdi.kubevirt.io/uploadClientName: default/source-default1-nfs-default/restore-9fe8e7c7-557e-45b8-a1f4-15405294aa07-root k8s.io/CloneOf: "true" k8s.io/CloneRequest: default/source-default1-nfs pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" restore.kubevirt.io/name: vm-target-default-ceph-rbd1-2022-7-3-0-restore-ugvktw volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com creationTimestamp: "2022-07-03T13:27:17Z" finalizers: - kubernetes.io/pvc-protection labels: app: containerized-data-importer app.kubernetes.io/component: storage app.kubernetes.io/managed-by: cdi-controller app.kubernetes.io/part-of: hyperconverged-cluster app.kubernetes.io/version: 4.10.3 name: restore-9fe8e7c7-557e-45b8-a1f4-15405294aa07-root namespace: default ownerReferences: - apiVersion: kubevirt.io/v1 blockOwnerDeletion: true controller: true kind: VirtualMachine name: vm-target-default-ceph-rbd1 uid: 5718c66c-9a9b-486a-a272-524650b0844d resourceVersion: "5510789" uid: cb00bf19-af52-412a-8c25-1d2f82cf0a64 spec: accessModes: - ReadWriteOnce dataSource: apiGroup: snapshot.storage.k8s.io kind: VolumeSnapshot name: vmsnapshot-1a3acd71-dfcf-4ddf-88c7-c59c4dc6c585-volume-root resources: requests: storage: 6Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block volumeName: pvc-cb00bf19-af52-412a-8c25-1d2f82cf0a64 status: accessModes: - ReadWriteOnce capacity: storage: 6Gi phase: Bound oc get volumesnapshots vmsnapshot-1a3acd71-dfcf-4ddf-88c7-c59c4dc6c585-volume-root -oyaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: creationTimestamp: "2022-07-03T13:24:46Z" finalizers: - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection - snapshot.storage.kubernetes.io/volumesnapshot-bound-protection generation: 1 name: vmsnapshot-1a3acd71-dfcf-4ddf-88c7-c59c4dc6c585-volume-root namespace: default ownerReferences: - apiVersion: snapshot.kubevirt.io/v1alpha1 blockOwnerDeletion: true controller: true kind: VirtualMachineSnapshotContent name: vmsnapshot-content-1a3acd71-dfcf-4ddf-88c7-c59c4dc6c585 uid: 5c375ba4-5e9d-4ab1-8175-472fb3000c4d resourceVersion: "5508070" uid: 005c15d3-aa3d-44c2-b321-4db03e2bf380 spec: source: persistentVolumeClaimName: target-default-ceph-rbd1 volumeSnapshotClassName: ocs-storagecluster-rbdplugin-snapclass status: boundVolumeSnapshotContentName: snapcontent-005c15d3-aa3d-44c2-b321-4db03e2bf380 creationTime: "2022-07-03T13:24:48Z" readyToUse: true restoreSize: 6Gi Source dv ---------------- oc get dv source-default1-nfs -oyaml apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: annotations: cdi.kubevirt.io/storage.pod.retainAfterCompletion: "true" creationTimestamp: "2022-07-03T12:10:40Z" generation: 21 name: source-default1-nfs namespace: default resourceVersion: "5437248" uid: 4c0ef836-f1c9-4675-9822-35bb3b86a829 spec: source: http: url: https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img storage: resources: requests: storage: 500Mi storageClassName: nfs status: claimName: source-default1-nfs conditions: - lastHeartbeatTime: "2022-07-03T12:10:40Z" lastTransitionTime: "2022-07-03T12:10:40Z" message: PVC source-default1-nfs Bound reason: Bound status: "True" type: Bound - lastHeartbeatTime: "2022-07-03T12:11:12Z" lastTransitionTime: "2022-07-03T12:11:12Z" status: "True" type: Ready - lastHeartbeatTime: "2022-07-03T12:11:10Z" lastTransitionTime: "2022-07-03T12:11:10Z" message: Import Complete reason: Completed status: "False" type: Running phase: Succeeded progress: 100.0% Target/cloned dv -------------------- oc get dv target-default-ceph-rbd1 -oyaml apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: annotations: cdi.kubevirt.io/cloneType: network cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE2NTY4NTQyNTgsImlhdCI6MTY1Njg1Mzk1OCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJzb3VyY2UtZGVmYXVsdDEtbmZzIiwibmFtZXNwYWNlIjoiZGVmYXVsdCIsIm5iZiI6MTY1Njg1Mzk1OCwib3BlcnRhdGlvbiI6IkNsb25lIiwicGFyYW1zIjp7InRhcmdldE5hbWUiOiJ0YXJnZXQtZGVmYXVsdC1jZXBoLXJiZDEiLCJ0YXJnZXROYW1lc3BhY2UiOiJkZWZhdWx0In0sInJlc291cmNlIjp7Imdyb3VwIjoiIiwicmVzb3VyY2UiOiJwZXJzaXN0ZW50dm9sdW1lY2xhaW1zIiwidmVyc2lvbiI6InYxIn19.boK_heSwAfWYAhwHNQJTBOybuZXVlaq7cKYL87xI0NycZstTlfpzwVg8NZCAlF4TxU_LvLb-Rxd9nX9OB-g7a3YG_S4qF86RFEkXM-stFiCK-ApOnOeafiodcIt8C3B3nMXURpaudK9-2jP4YZ3KLWb8g6KSSxeyR5SnY8afCJzmQqkf8nbzaittrI_04oVk3UM7qGTLd9LEHCDjyq9jal0-fT-6uaV3Swwytxpgyb3JSYFCvSIZzxodEt2EFkNLZn1gFPwKIZHG4WQdy8u91Da38a_K_CsqaPz2Rx23ghaiARTADghwGnVa_VT4EJH1dX8oUy5TxF19chh2fMREVQ creationTimestamp: "2022-07-03T13:12:38Z" generation: 26 name: target-default-ceph-rbd1 namespace: default resourceVersion: "5496921" uid: ccfcd020-e9f7-4208-9500-73a47d7f9e2c spec: source: pvc: name: source-default1-nfs namespace: default storage: accessModes: - ReadWriteOnce resources: requests: storage: 6Gi storageClassName: ocs-storagecluster-ceph-rbd status: claimName: target-default-ceph-rbd1 conditions: - lastHeartbeatTime: "2022-07-03T13:12:42Z" lastTransitionTime: "2022-07-03T13:12:42Z" message: PVC target-default-ceph-rbd1 Bound reason: Bound status: "True" type: Bound - lastHeartbeatTime: "2022-07-03T13:13:17Z" lastTransitionTime: "2022-07-03T13:13:17Z" status: "True" type: Ready - lastHeartbeatTime: "2022-07-03T13:13:15Z" lastTransitionTime: "2022-07-03T13:13:15Z" message: Clone Complete reason: Completed status: "False" type: Running phase: Succeeded progress: 100.0%
I do not like your results. Looks suspicious. I need to config that the clone operation was not atempted by the CDI.
I mean *I need to confirm* that the restore was clean, and the clone operation was not run again. It would mean we fixed one problem and uncovered another.
Reproduced the original bug with the following additional information: ----------------------------------------------------------------------------- Reopening the bug: Upload pod: -------------- oc describe pod cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root Name: cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root Namespace: default Priority: 0 Node: stg03-kevin-mqcc4-worker-0-zffw4/192.168.1.39 Start Time: Wed, 06 Jul 2022 10:47:21 +0000 Labels: app=containerized-data-importer app.kubernetes.io/component=storage app.kubernetes.io/managed-by=cdi-controller app.kubernetes.io/part-of=hyperconverged-cluster app.kubernetes.io/version=4.10.3 cdi.kubevirt.io=cdi-upload-server cdi.kubevirt.io/uploadTarget=e4c71394-6671-4e64-88ee-fcafadec8cc1 service=cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root Annotations: cdi.kubevirt.io/storage.createdByUploadController: yes k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.143" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.143" ], "default": true, "dns": {} }] sidecar.istio.io/inject: false Status: Running IP: 10.128.2.143 IPs: IP: 10.128.2.143 Controlled By: PersistentVolumeClaim/restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root Containers: cdi-upload-server: Container ID: cri-o://f4284a52256aef3cea8c3b813b88fdb461754fa17944c9f9b3a2d2627ccdcfce Image: registry.redhat.io/container-native-virtualization/virt-cdi-uploadserver@sha256:f824df36b2e080af35a7d86adb0fb58289300d00f97688b245428eb9035fe08e Image ID: registry.redhat.io/container-native-virtualization/virt-cdi-uploadserver@sha256:2e634b958b8d91c5d7281af707597fccead0e8a1373d2e0df005a59cbf4ad9cf Port: <none> Host Port: <none> Args: -v=1 State: Running Started: Wed, 06 Jul 2022 10:47:34 +0000 Ready: True Restart Count: 0 Limits: cpu: 750m memory: 600M Requests: cpu: 100m memory: 60M Readiness: http-get http://:8080/healthz delay=2s timeout=1s period=5s #success=1 #failure=3 Environment: TLS_KEY: <set to the key 'tls.key' in secret 'cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root'> Optional: false TLS_CERT: <set to the key 'tls.crt' in secret 'cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root'> Optional: false CLIENT_CERT: -----BEGIN CERTIFICATE----- MIIDaTCCAlGgAwIBAgIIUFtgr/UcO9cwDQYJKoZIhvcNAQELBQAwQjFAMD4GA1UE Aww3b3BlbnNoaWZ0LWNudl9jZGktdXBsb2Fkc2VydmVyLWNsaWVudC1zaWduZXJA MTY1NjUxNTc3MjAeFw0yMjA2MjkxNTE2MTJaFw0zMjA2MjYxNTE2MTNaMEIxQDA+ BgNVBAMMN29wZW5zaGlmdC1jbnZfY2RpLXVwbG9hZHNlcnZlci1jbGllbnQtc2ln bmVyQDE2NTY1MTU3NzIwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDZ iGnknyFuiKv9d1zkurmtBtkD0L3qtZnErsJk/YUyWkoV5qDCHqX6f/vMZY2r8mF2 fm/NFRJycraFay7iykifWrhm6iiduiDJ796wpISecdGQV9WYS9FrVjmBmzf49OBj OfK96jEOK0G9tsm9utINQCQKsgmVveHsuZqzQbVdAnxOw6DJ3uAxJAGT+OxkDKy7 t+LUGCy5F2Wp6nUvERMpps4WNL4tfu2uVHJ3KIuSv2R10v/Tv5z+mqRxVFPc99LH 9BIxLH0I4rz1IMnkPWP1S/a9eb738RMfxHl2ggkv4aT9ZwOH/FYA1iESb7VOEJzO 2jwS9GdROec7hqRh2Na5AgMBAAGjYzBhMA4GA1UdDwEB/wQEAwICpDAPBgNVHRMB Af8EBTADAQH/MB0GA1UdDgQWBBRBS9AN+heQC3bUGA6y/FhMFcrAsjAfBgNVHSME GDAWgBRBS9AN+heQC3bUGA6y/FhMFcrAsjANBgkqhkiG9w0BAQsFAAOCAQEAGpoJ MU5YviGC4CzBZ9c4lypPOGn3bjbiszE9CFXRFre7ohC44kr4hBrqAsg2d4gFjjBO vFgPS8JHfnAt+Ue/f89moxnqc4jpUk5Db1EzTLo2mHA3E6/UAWPbqQwk8dwCdGTw DAOOWTApUgJhQEqNd2Lewgr0hWTvmK9E+yrN640MRJCtI4JkBXUaZTBf3bBkJxS7 nkliFr2GYE8nMt2Yz+0YSNtUb07QA8PaqLGRMhvCdjfK29LoZc7fktlYQdC17/Is apLeOfTd5+GawyOCS1V1jlqQCGf+xWpAKtAiemdovbCikMhxAHpviOCzNj2knkhK N8ZJhUtHjQIHZP2P/w== -----END CERTIFICATE----- FILESYSTEM_OVERHEAD: 0 UPLOAD_IMAGE_SIZE: 6Gi CLIENT_NAME: default/source-default-nfs1-default/restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root PREALLOCATION: false DESTINATION: /dev/cdi-block-volume Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zl7lv (ro) Devices: /dev/cdi-block-volume from cdi-data-vol Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: cdi-data-vol: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root ReadOnly: false kube-api-access-zl7lv: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m19s default-scheduler Successfully assigned default/cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root to stg03-kevin-mqcc4-worker-0-zffw4 Normal SuccessfulAttachVolume 4m18s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-e4c71394-6671-4e64-88ee-fcafadec8cc1" Normal SuccessfulMountVolume 4m9s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-e4c71394-6671-4e64-88ee-fcafadec8cc1" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-e4c71394-6671-4e64-88ee-fcafadec8cc1/dev" Normal SuccessfulMountVolume 4m9s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-e4c71394-6671-4e64-88ee-fcafadec8cc1" volumeMapPath "/var/lib/kubelet/pods/138a753e-8518-44b6-955d-1106d8cd427d/volumeDevices/kubernetes.io~csi" Normal AddedInterface 4m6s multus Add eth0 [10.128.2.143/23] from openshift-sdn Normal Pulled 4m6s kubelet Container image "registry.redhat.io/container-native-virtualization/virt-cdi-uploadserver@sha256:f824df36b2e080af35a7d86adb0fb58289300d00f97688b245428eb9035fe08e" already present on machine Normal Created 4m6s kubelet Created container cdi-upload-server Normal Started 4m5s kubelet Started container cdi-upload-server The vmrestore: ---------------- oc describe vm-target-default-ceph-rbd1-2022-7-6-0-restore-9sjknm error: the server doesn't have a resource type "vm-target-default-ceph-rbd1-2022-7-6-0-restore-9sjknm" [cnv-qe-jenkins@localhost ~]$ oc describe vmrestore vm-target-default-ceph-rbd1-2022-7-6-0-restore-9sjknm Name: vm-target-default-ceph-rbd1-2022-7-6-0-restore-9sjknm Namespace: default Labels: <none> Annotations: <none> API Version: snapshot.kubevirt.io/v1alpha1 Kind: VirtualMachineRestore Metadata: Creation Timestamp: 2022-07-06T10:47:18Z Generation: 5 Managed Fields: API Version: snapshot.kubevirt.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:target: .: f:apiGroup: f:kind: f:name: f:virtualMachineSnapshotName: Manager: Mozilla Operation: Update Time: 2022-07-06T10:47:18Z API Version: snapshot.kubevirt.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:ownerReferences: .: k:{"uid":"b1346918-e2a2-4c4b-b784-8d1b27762ea9"}: f:status: .: f:complete: f:conditions: f:restoreTime: f:restores: Manager: virt-controller Operation: Update Time: 2022-07-06T10:47:21Z Owner References: API Version: kubevirt.io/v1 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: vm-target-default-ceph-rbd1 UID: b1346918-e2a2-4c4b-b784-8d1b27762ea9 Resource Version: 9548820 UID: b4101e84-b2dd-41a0-b439-9cf1222c401a Spec: Target: API Group: kubevirt.io Kind: VirtualMachine Name: vm-target-default-ceph-rbd1 Virtual Machine Snapshot Name: vm-target-default-ceph-rbd1-2022-7-6-0 Status: Complete: true Conditions: Last Probe Time: <nil> Last Transition Time: 2022-07-06T10:47:21Z Reason: Operation complete Status: False Type: Progressing Last Probe Time: <nil> Last Transition Time: 2022-07-06T10:47:21Z Reason: Operation complete Status: True Type: Ready Restore Time: 2022-07-06T10:47:21Z Restores: Persistent Volume Claim: restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root Volume Name: root Volume Snapshot Name: vmsnapshot-d994cb03-99ef-4ee9-85ce-3563775aab14-volume-root Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal VirtualMachineRestoreComplete 10m (x3 over 10m) restore-controller Successfully completed VirtualMachineRestore vm-target-default-ceph-rbd1-2022-7-6-0-restore-9sjknm The vm launcher pod: -------------------------- oc describe pod virt-launcher-vm-target-default-ceph-rbd1-xqnxl Name: virt-launcher-vm-target-default-ceph-rbd1-xqnxl Namespace: default Priority: 0 Node: stg03-kevin-mqcc4-worker-0-x6lln/192.168.2.208 Start Time: Wed, 06 Jul 2022 10:49:35 +0000 Labels: kubevirt.io=virt-launcher kubevirt.io/created-by=43cab089-04ea-4de2-b2d8-6a7dbe8fd5ff kubevirt.io/vm=vm-my-data-volume vm.kubevirt.io/name=vm-target-default-ceph-rbd1 Annotations: kubevirt.io/domain: vm-target-default-ceph-rbd1 kubevirt.io/migrationTransportUnix: true post.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--unfreeze", "--name", "vm-target-default-ceph-rbd1", "--namespace", "default"] post.hook.backup.velero.io/container: compute pre.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--freeze", "--name", "vm-target-default-ceph-rbd1", "--namespace", "default"] pre.hook.backup.velero.io/container: compute traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 Status: Pending IP: IPs: <none> Controlled By: VirtualMachineInstance/vm-target-default-ceph-rbd1 Containers: compute: Container ID: Image: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:5e44e85ff8da6edf26c4942ca61a83bda92896ebeddde8de528ac313dfdda557 Image ID: Port: <none> Host Port: <none> Command: /usr/bin/virt-launcher --qemu-timeout 294s --name vm-target-default-ceph-rbd1 --uid 43cab089-04ea-4de2-b2d8-6a7dbe8fd5ff --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 Requests: cpu: 100m devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 ephemeral-storage: 50M memory: 434757920 Environment: POD_NAME: virt-launcher-vm-target-default-ceph-rbd1-xqnxl (v1:metadata.name) Mounts: /var/run/kubevirt from public (rw) /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw) /var/run/kubevirt-private from private (rw) /var/run/kubevirt/container-disks from container-disks (rw) /var/run/kubevirt/hotplug-disks from hotplug-disks (rw) /var/run/kubevirt/sockets from sockets (rw) /var/run/libvirt from libvirt-runtime (rw) Devices: /dev/root from root Readiness Gates: Type Status kubevirt.io/virtual-machine-unpaused True Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True kubevirt.io/virtual-machine-unpaused True Volumes: private: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> public: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> sockets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> root: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root ReadOnly: false virt-bin-share-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> libvirt-runtime: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> ephemeral-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> container-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> hotplug-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> QoS Class: Burstable Node-Selectors: kubevirt.io/schedulable=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9m12s default-scheduler Successfully assigned default/virt-launcher-vm-target-default-ceph-rbd1-xqnxl to stg03-kevin-mqcc4-worker-0-x6lln Warning FailedAttachVolume 9m13s attachdetach-controller Multi-Attach error for volume "pvc-e4c71394-6671-4e64-88ee-fcafadec8cc1" Volume is already used by pod(s) cdi-upload-restore-b4101e84-b2dd-41a0-b439-9cf1222c401a-root Warning FailedMount 7m10s kubelet Unable to attach or mount volumes: unmounted volumes=[root], unattached volumes=[libvirt-runtime root sockets private public ephemeral-disks container-disks hotplug-disks]: timed out waiting for the condition Warning FailedMount 4m55s kubelet Unable to attach or mount volumes: unmounted volumes=[root], unattached volumes=[ephemeral-disks root container-disks hotplug-disks libvirt-runtime sockets private public]: timed out waiting for the condition Warning FailedMount 2m37s kubelet Unable to attach or mount volumes: unmounted volumes=[root], unattached volumes=[container-disks hotplug-disks root libvirt-runtime sockets private public ephemeral-disks]: timed out waiting for the condition Warning FailedMount 23s kubelet Unable to attach or mount volumes: unmounted volumes=[root], unattached volumes=[public root ephemeral-disks container-disks hotplug-disks libvirt-runtime sockets private]: timed out waiting for the condition source dv yaml: --------------------- apiVersion: cdi.kubevirt.io/v1alpha1 kind: DataVolume metadata: name: source-default1-nfs annotations: cdi.kubevirt.io/storage.pod.retainAfterCompletion: "true" spec: source: http: url: "https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img" storage: storageClassName: nfs resources: requests: storage: 500Mi target dv yaml: ------------------ apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: target-default-ceph-rbd1 spec: source: pvc: namespace: default name: source-default-nfs1 storage: storageClassName: ocs-storagecluster-ceph-rbd accessModes: - ReadWriteOnce resources: requests: storage: 6Gi vm yaml: ----------- apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: labels: kubevirt.io/vm: vm-my-data-volume name: vm-target-default-ceph-rbd1 spec: running: false template: metadata: labels: kubevirt.io/vm: vm-my-data-volume spec: domain: devices: disks: - disk: bus: virtio name: root resources: requests: memory: 256M volumes: - dataVolume: name: target-default-ceph-rbd1 name: root
The first PR fixed the problem on the CDI side with hadnling the prePopulated volumes, but I did not see the second problem. The second problem is that the pvc from snapshot restore does not have the prePopulated Annotation.
Looks like the issue is that the restore controller is not stripping the "k8s.io/CloneRequest" annotation from the PVC that is created from the snapshot. These are the annotations (prefixes) that are currently removed [1]. Also, the prePopulated annotation is not applicable in this situation since the VM was referring to a stand-alone DataVolume and not a dataVolumeTemplate. When stand-alone DataVolumes are snapshotted, the DataVolume definition is not saved. At restore time, only the pvc is created and the VolumeSource is converted from DataVolume to PersistentVolumeClaim [2] As-is, this problem can lead to data loss if the VM has DataVolumeTemplates and is not snapshotted before a restore. [1] https://github.com/kubevirt/kubevirt/blob/v0.53.2/pkg/virt-controller/watch/snapshot/restore.go#L73-L78 [2] https://github.com/kubevirt/kubevirt/blob/v0.53.2/pkg/virt-controller/watch/snapshot/restore.go#L473-L484
Workaround for this issue if encountered: 1. Get the name of the VirtualMachineSnapshotContent resource from `status.virtualMachineSnapshotContentName` of the VirtualMachineSnapshot 2. Edit the VirtualMachineSnapshotContent resource, remove all lines containing `k8s.io/cloneRequest` 3. Do another restore from the same VirtualMachineSnapshot 4. (optional) Check namespace for any DataVolumes/PresistentVolumeClaims with the `restore-` prefix that are not referenced by any VMs. They can be deleted. This should only be required if not using `spec.dataVolumeTemplates` in the VM.
In light of the given workaround I am removing blocker+ and pushing to the next z-stream.
Added known issue to the 4.11 release notes. @kgoldbla Please review: https://github.com/openshift/openshift-docs/pull/48328. Thank you.
Assigning back to you Bartosz. Can you handle the backports please?
I need to backport to 4.10 - for kubevirt it means upstream 0.49 - in progress
Backport merged and the D/S build is ready for testing
Verified with the following code: ---------------------------------------------- oc version Client Version: 4.10.0-202207291637.p0.ge29d58e.assembly.stream-e29d58e Server Version: 4.10.26 Kubernetes Version: v1.23.5+012e945 oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.10.5 OpenShift Virtualization 4.10.5 kubevirt-hyperconverged-operator.v4.10.3 Succeeded volsync-product.v0.4.1 VolSync 0.4.1 volsync-product.v0.4.0 Succeeded Verified with the following scenario: --------------------------------------------- 1. Created source dv on nfs sc 2. Cloned source dv to ceph-rbd target dv 3. Created a VM from the target dv 4. Started the VM via the UI 5. Created a snapshot of the running VM 6. Shut down the VM via the UI 7. Restored the snapshot via the UI 8. VM is created referencing the restored pvc 9. Started the VM successfully 10.Accessed the VM successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.10.5 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6351