Description of problem: VM import from RHV to CNV, using storage class: Ceph-RBD, and volumeMode: Filesystem (default on CNV-2.4), either pending PVN bound or failing, depending on source VM disk type: Preallocated or Thin provision. 1. For source VM disk: preallocated - PVC will remain pending forever. 2. For source VM disk: Thin provision - Disk is copied and at the end of the copy, the VM and DV are removed, and in UI VMs view this error is displayed: "The virtual machine could not be imported. DataVolumeCreationFailed: Error while importing disk image: fedora32-b870c429-11e0-4630-b3df-21da551a48c0" Version-Release number of selected component (if applicable): OCP-4.5/CNV-2.4 Expected results: VM import from RHV to CNv should work well for Ceph-RBD/Filesystem 1
Ilanit, Can you please supply some info: oc describe pvc <pvc_name> -n <namespace_name> oc get pvc -oyaml -n <namespace_name> oc get pods -n <namespace_name> oc get events -n <namespace_name>
No CNV environment to test on right now. Will retest to provides inputs, once I'll have one.
On CNV-2.4.1, VM import from RHV of a Fedora32/RHEL8 VM with Thin-provision/Preallocated, using Ceph-RBD/Filesystem is pending PVC to bound. This is a different behavior, for the Thin provision, than the one reported in the bug description, for CNV-2.4.0. $oc describe pvc for __both__ cases show same error: $ oc describe pvc v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 Name: v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 Namespace: default StorageClass: ocs-storagecluster-ceph-rbd Status: Pending Volume: Labels: app=containerized-data-importer Annotations: cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.io2j878 cdi.kubevirt.io/storage.import.diskId: 8737b4f7-2b6b-4801-abc9-e307f838b337 cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api cdi.kubevirt.io/storage.import.importPodName: importer-v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iossbwr cdi.kubevirt.io/storage.import.source: imageio cdi.kubevirt.io/storage.pod.phase: Pending cdi.kubevirt.io/storage.pod.restarts: 0 volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Mounted By: importer-v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 96s (x14 over 25m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ppt5j_74c6c775-5839-4035-b4fc-60c11a76557d External provisioner is provisioning volume for claim "default/v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337" Warning ProvisioningFailed 96s (x14 over 25m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ppt5j_74c6c775-5839-4035-b4fc-60c11a76557d failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes Normal ExternalProvisioning 8s (x105 over 25m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator ** Notice that Access Modes is empty. @Adam, Do you see any reason for the different behavior seen on CNV-2.4.1, for the Thin provision? Is there anything blocking this bug from getting fixed for CNV-2.5 please?
This is the problem: Warning ProvisioningFailed 96s (x14 over 25m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ppt5j_74c6c775-5839-4035-b4fc-60c11a76557d failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes You are using RWX with file system volume mode, which is not allowed in ceph, either use RWO with file system or RWX with block.
I need to make sure I say it correctly when I say which is not allowed in ceph. I mean which is not allowed with ceph rbd, other backends do allow it.
Thanks Alexander Wels, $ oc get pvc <pvc name> -o yaml (output right bellow) show that the access mode in the request is ReadWriteMany, while in UI VM import wizard, under Disk advanced, it shows it is set to RWO. ("Edit_disk" screenshot attached). The ReadWriteMany setting comes from the source VM likely. We seem to have 2 "VM import" issues here: 1. The problem Alex indicated: We try to do RWX on file system volume mode with ceph-rbd, and this cannot be done. 2. UI Access mode settings (RWO) is not honored, as the actual access mode in the request is different (RWM). $ oc get pvc v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.io2j878 cdi.kubevirt.io/storage.import.diskId: 8737b4f7-2b6b-4801-abc9-e307f838b337 cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api cdi.kubevirt.io/storage.import.importPodName: importer-v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iossbwr cdi.kubevirt.io/storage.import.source: imageio cdi.kubevirt.io/storage.pod.phase: Pending cdi.kubevirt.io/storage.pod.restarts: "0" volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com creationTimestamp: "2020-08-26T05:59:35Z" finalizers: - kubernetes.io/pvc-protection labels: app: containerized-data-importer managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:cdi.kubevirt.io/storage.import.certConfigMap: {} f:cdi.kubevirt.io/storage.import.diskId: {} f:cdi.kubevirt.io/storage.import.endpoint: {} f:cdi.kubevirt.io/storage.import.importPodName: {} f:cdi.kubevirt.io/storage.import.secretName: {} f:cdi.kubevirt.io/storage.import.source: {} f:cdi.kubevirt.io/storage.pod.phase: {} f:cdi.kubevirt.io/storage.pod.restarts: {} f:labels: .: {} f:app: {} f:ownerReferences: .: {} k:{"uid":"f883836f-09ee-42d2-9858-b015a5af69b6"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:accessModes: {} f:resources: f:requests: .: {} f:storage: {} f:storageClassName: {} f:volumeMode: {} f:status: f:phase: {} manager: virt-cdi-controller operation: Update time: "2020-08-26T05:59:35Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:volume.beta.kubernetes.io/storage-provisioner: {} manager: kube-controller-manager operation: Update time: "2020-08-26T05:59:39Z" name: v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 namespace: default ownerReferences: - apiVersion: cdi.kubevirt.io/v1alpha1 blockOwnerDeletion: true controller: true kind: DataVolume name: v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 uid: f883836f-09ee-42d2-9858-b015a5af69b6 resourceVersion: "837470" selfLink: /api/v1/namespaces/default/persistentvolumeclaims/v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 uid: 4548a434-c4f9-4748-83cb-00734b5a7dae spec: accessModes: - ReadWriteMany <------ resources: requests: storage: 4Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Filesystem status: phase: Pending
Created attachment 1712866 [details] Edit_Disk screenshot
@Alex, we set accessMode based on source vm settings. https://github.com/kubevirt/vm-import-operator/blob/f1e0efa6c53a71fe629ad4e64f1a45a1b34aa57e/pkg/providers/ovirt/mapper/mapper.go#L235 We use ReadWriteMany only when placement policy is set to VMAFFINITY_MIGRATABLE. I think that ReadWriteOnce would prevent us to live migrate vms in the future. Am I right?
That is correct, RWX is required for live migration. So with ceph rbd the volume mode has to be block in order for live migration to work.
Thank you, I think we can move this BZ to vm-import and block such vms.
Opened this bug, for removing the "Access Mode" option from UI: Bug 1873779 - [v2v][RHV to CNV VM import] Remove "Edit Disk": "Access Mode"
I tried on two different envs to import non_migratable VM from RHV to CNV, which means- VM placement_policy was set to <affinity>pinned</affinity> The import failed with this error: The virtual machine could not be imported. DataVolumeCreationFailed: Error while importing disk image: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314. pod CrashLoopBackoff restart exceeded The import seems to be started but around 10% conversion it failed, pvc & conversion pod are automatically deleted (the commands output below was gathered before import failed) Test scenario: 1. Edit the RHV source VM with: Migration mode: Do not allow migration (in 'Host' tab) This option will prevent VM migration between RHV hosts and will set via rest api: placement_policy to be 'pinned' 2. Import with Wizard this non-migratable VM using Ceph-RBD, filesystem is default in 2.4.1 and accessMode should be set to RWO as the VM is non_migratable * source VM disk type: thin provision More Info: $ oc describe pvc v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 Name: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 Namespace: default StorageClass: ocs-storagecluster-ceph-rbd Status: Bound Volume: pvc-c4982245-709f-424f-bed6-3470fbfb4d8d Labels: app=containerized-data-importer Annotations: cdi.kubevirt.io/storage.condition.running: false cdi.kubevirt.io/storage.condition.running.message: back-off 20s restarting failed container=importer pod=importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314_default(a8596... cdi.kubevirt.io/storage.condition.running.reason: CrashLoopBackOff cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.iokkvrb cdi.kubevirt.io/storage.import.diskId: 4d0fd178-a83b-4d32-ad53-da560c410314 cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api cdi.kubevirt.io/storage.import.importPodName: importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iowbkzs cdi.kubevirt.io/storage.import.source: imageio cdi.kubevirt.io/storage.pod.phase: Running cdi.kubevirt.io/storage.pod.restarts: 2 pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 40Mi Access Modes: RWO VolumeMode: Filesystem Mounted By: importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalProvisioning 49s (x2 over 49s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Normal Provisioning 49s openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ktlwf_b5e957a8-497c-4149-a63d-d2d5b8f4b72d External provisioner is provisioning volume for claim "default/v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314" Normal ProvisioningSucceeded 48s openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ktlwf_b5e957a8-497c-4149-a63d-d2d5b8f4b72d Successfully provisioned volume pvc-c4982245-709f-424f-bed6-3470fbfb4d8d Warning ErrImportFailed 30s (x4 over 32s) import-controller Unable to process data: write /data/disk.img: no space left on device Warning ErrImportFailed 2s (x6 over 17s) import-controller Unable to connect to imageio data source: Fault reason is "Operation Failed". Fault detail is "[Cannot transfer Virtual Disk: The following disks are locked: GlanceDisk-aa51d20_v2v_cirros_vm_non_migratable. Please try again in a few minutes.]". HTTP response code is "409". HTTP response message is "409 Conflict". $ oc get pvc -oyaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: cdi.kubevirt.io/storage.condition.running: "false" cdi.kubevirt.io/storage.condition.running.message: back-off 40s restarting failed container=importer pod=importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314_default(a8596ada-a6a8-4dc6-b3b4-999cadb3f095) cdi.kubevirt.io/storage.condition.running.reason: CrashLoopBackOff cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.iokkvrb cdi.kubevirt.io/storage.import.diskId: 4d0fd178-a83b-4d32-ad53-da560c410314 cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api cdi.kubevirt.io/storage.import.importPodName: importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iowbkzs cdi.kubevirt.io/storage.import.source: imageio cdi.kubevirt.io/storage.pod.phase: Running cdi.kubevirt.io/storage.pod.restarts: "3" pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com creationTimestamp: "2020-09-02T11:50:16Z" finalizers: - kubernetes.io/pvc-protection labels: app: containerized-data-importer managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:pv.kubernetes.io/bind-completed: {} f:pv.kubernetes.io/bound-by-controller: {} f:volume.beta.kubernetes.io/storage-provisioner: {} f:spec: f:volumeName: {} f:status: f:accessModes: {} f:capacity: .: {} f:storage: {} f:phase: {} manager: kube-controller-manager operation: Update time: "2020-09-02T11:50:19Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:cdi.kubevirt.io/storage.condition.running: {} f:cdi.kubevirt.io/storage.condition.running.message: {} f:cdi.kubevirt.io/storage.condition.running.reason: {} f:cdi.kubevirt.io/storage.import.certConfigMap: {} f:cdi.kubevirt.io/storage.import.diskId: {} f:cdi.kubevirt.io/storage.import.endpoint: {} f:cdi.kubevirt.io/storage.import.importPodName: {} f:cdi.kubevirt.io/storage.import.secretName: {} f:cdi.kubevirt.io/storage.import.source: {} f:cdi.kubevirt.io/storage.pod.phase: {} f:cdi.kubevirt.io/storage.pod.restarts: {} f:labels: .: {} f:app: {} f:ownerReferences: .: {} k:{"uid":"78ca742c-2e4f-4f33-9f5c-35ab5e5a57be"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:accessModes: {} f:resources: f:requests: .: {} f:storage: {} f:volumeMode: {} manager: virt-cdi-controller operation: Update time: "2020-09-02T11:51:32Z" name: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 namespace: default ownerReferences: - apiVersion: cdi.kubevirt.io/v1alpha1 blockOwnerDeletion: true controller: true kind: DataVolume name: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 uid: 78ca742c-2e4f-4f33-9f5c-35ab5e5a57be resourceVersion: "13111878" selfLink: /api/v1/namespaces/default/persistentvolumeclaims/v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 uid: c4982245-709f-424f-bed6-3470fbfb4d8d spec: accessModes: - ReadWriteOnce resources: requests: storage: "41127936" storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Filesystem volumeName: pvc-c4982245-709f-424f-bed6-3470fbfb4d8d status: accessModes: - ReadWriteOnce capacity: storage: 40Mi phase: Bound kind: List metadata: resourceVersion: "" selfLink: "" @Piotr, @Alex, can you please shed some light on this?
The PVC size seems suspiciously small 40Mi. Capacity: 40Mi Access Modes: RWO VolumeMode: Filesystem Mounted By: importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
(In reply to Alexander Wels from comment #13) > The PVC size seems suspiciously small 40Mi. > > Capacity: 40Mi > Access Modes: RWO > VolumeMode: Filesystem > Mounted By: > importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314 We use this VM in our v2v tests. I can retry with a bigger one
So the original source VM disk is only 40Mi? The import is failing on "Unable to process data: write /data/disk.img: no space left on device" which indicates there is no space left to import. With CDI we have had a long struggle with file system overhead on imports. For instance with ceph rbd, it uses an xfs file system on the PV. The original PV is created on a block device of the requested size. The FS will have some overhead. On a really small PV like that, the overhead of the FS will likely be a significant percentage of the total size, which is likely why you are running out of space.
Retested with two other non_migratable VMs, using ceph-rbd, on CNV 2.4.1 (the scenario described in comment #12 was also on CNV 2.4.1): - VM disk size: 4 GiB, thin provision - same import error: pod CrashLoopBackoff restart exceeded and same messages in "describe pvc" output as described in comment #12 - VM disk size: 25 GiB, preallocated - same.
As far as I understand filesystem with ceph-rbd is not supported in 2.5. I think we should target this bug to future release when it will be supported so we can make sure vm-import is handling it correctly. In terms of pvc size I am not sure whether we can/should do anything about it. It seems like this is more generic issue and should be solved by CDI. @Alex do you agree?
I don't think RWX with ceph-rbd and file system will ever be supported, there would have to be a file system that allows simultaneous writes from different locations on the block device, like cephfs. CDI cannot solve the problem of the PVC being smaller than the size of the data being written. In general we recommend making the PVC 5-10% larger than the actual data you are writing to it for overhead purposes. I am not entirely sure where the size of the PVC is determined, I am assuming from the source VM definition? We are building a mechanism for the cluster admins to specify the overhead per storage class, once that is available you should be able to read the value from the CDIConfig object. Until that is available I suggest the importer to just add like 5-10% extra to the size compared to how it determines it now.
With this anyone who is using CDI would need to add the overhead. We are unable to get storage capabilities to figure out whether to add overhead or not. We have no ability to make this decision.
CDI doesn't have the ability either, the user tells us, write data from this source into this PVC of this size. Then the kubernetes storage system creates the PVC based on the requested size. The problem is, not every provisioner does this the same way. The issue happens when the provisioner makes a block device of the requested size, and then puts a file system on top of it. Then the actual available size < requested size and we are left struggling with how to handle that. We can't do anything but ask the user to make a bigger PVC.
We are told to import a vm with disk sizes specified in source infrastructure (RHV or vmware). We can't offload this to the user this time the user do not allocate any pvcs.
From CDIs perspective, the VM import is the user creating the PVCs. If you are creating a PVC of the exact size, and due to fs overhead there is not enough space available. I am not sure what we can do to solve it, if the disk is the entire size.
vm import is creating DVs not PVC and it doesn't know storage class capability to decide whether to add overhead or not. It has not enough information to make this decision. We only know what is reported by the source infrastructure and use it to create DV(s).
I filed RFE issue: https://bugzilla.redhat.com/show_bug.cgi?id=1883908 - [RFE][v2v][RHV to CNV VM import] VM Importer should request a slightly larger DV in order to overcome Filesystem overhead when using Ceph-RBD storage class Thanks Alexander Wels for detailed explanation on what exactly is the issue we are facing with
We have no ability to tell when to add overhead. This BZ should be verified once the RFE is implemented that is why it was targeted to future release. Updating dependency.
*** Bug 1885304 has been marked as a duplicate of this bug. ***
@awels do you have the link to the CDI PR that takes care of the filesystem overhead? IIRC, this is already done, and this BZ should be moved to ON_QA.
We have a PR that allows an admin to specify the overhead on a global or per storage class basis. The VM importer could use this information once it determines which storage class it is going to use to increase the requested PV size by that percentage. https://github.com/kubevirt/containerized-data-importer/pull/1319 is the PR implementing the overhead logic. The CDIConfig object will contain the information you guys need.
https://github.com/kubevirt/vm-import-operator/pull/469
The PR has been merged and the d/s automated checks are green. The change should be in hco-bundle-registry-container-v2.6.0-532 and onwards.
Migratable VM + Filesystem + Ceph-RBD is not allowed. We need to block it/Fail it with an error, saying the user can either change the source VM to be non migratable, or use volumeMode: Block.
What do you mean by "is not allowed" ? The problem reported in this BZ was that when using Ceph-RBD + Filesystem the overhead for the filesystem was not accounted for, which made the migration fail. MTV will not let you use it though, since it was failing in VMIO. We have implemented Provisioner CRs exactly for that purpose and the only allowed mode today is Ceph-RBD + Block. But that's only in MTV.
This bug includes 2 issues: 1) Not accounting for the filesystem overhead. 2) Allowing live migratable VM to be imported with Ceph-RBD + Filesystem. As the last fix is for 1), changing the bug title to match it.
import rhv to cnv using cephs/filesystem RWO works (1)
filed this bug for issue 2) in comment #35: Bug 1928767 - [VM import from RHV to CNV] Migratable VM import (RWX) to Ceph-RBD/Filesystem should be blocked
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799