1864577 – [v2v][RHV to CNV non migratable source VM fails to import to Ceph-rbd / File system due to overhead required for Filesystem

Bug 1864577 - [v2v][RHV to CNV non migratable source VM fails to import to Ceph-rbd / File system due to overhead required for Filesystem

Summary: [v2v][RHV to CNV non migratable source VM fails to import to Ceph-rbd / File ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	V2V
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	2.6.0
Assignee:	Sam Lucidi
QA Contact:	Amos Mastbaum
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1885304 (view as bug list)
Depends On:	1883908
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-03 19:34 UTC by Ilanit Stein
Modified:	2021-03-10 11:19 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-10 11:18:00 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	istein: needinfo-

Attachments	(Terms of Use)
Edit_Disk screenshot (33.47 KB, image/png) 2020-08-27 18:17 UTC, Ilanit Stein	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:0799	0	None	None	None	2021-03-10 11:19:14 UTC

Description Ilanit Stein 2020-08-03 19:34:42 UTC

Description of problem:
VM import from RHV to CNV,
using storage class: Ceph-RBD, and volumeMode: Filesystem (default on CNV-2.4), 
either pending PVN bound or failing, 
depending on source VM disk type: Preallocated or Thin provision. 

1. For source VM disk: preallocated - 
PVC will remain pending forever.

2. For source VM disk: Thin provision - 
Disk is copied and at the end of the copy, the VM and DV are removed, and in UI VMs view this error is displayed:
"The virtual machine could not be imported.
DataVolumeCreationFailed: Error while importing disk image: fedora32-b870c429-11e0-4630-b3df-21da551a48c0"

Version-Release number of selected component (if applicable):
OCP-4.5/CNV-2.4

Expected results:
VM import from RHV to CNv should work well for Ceph-RBD/Filesystem

1

Comment 1 Natalie Gavrielov 2020-08-05 12:20:12 UTC

Ilanit,

Can you please supply some info:
oc describe pvc <pvc_name> -n <namespace_name>
oc get pvc -oyaml -n <namespace_name>
oc get pods -n <namespace_name>
oc get events -n <namespace_name>

Comment 2 Ilanit Stein 2020-08-10 08:10:41 UTC

No CNV environment to test on right now.
Will retest to provides inputs, once I'll have one.

Comment 3 Ilanit Stein 2020-08-26 07:37:29 UTC

On CNV-2.4.1, 
VM import from RHV of a Fedora32/RHEL8 VM with Thin-provision/Preallocated,
using Ceph-RBD/Filesystem is pending PVC to bound. 

This is a different behavior, for the Thin provision, than the one reported in the bug description, for CNV-2.4.0.
 
$oc describe pvc for __both__ cases show same error:

$ oc describe pvc v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
Name:          v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
Namespace:     default
StorageClass:  ocs-storagecluster-ceph-rbd
Status:        Pending
Volume:        
Labels:        app=containerized-data-importer
Annotations:   cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.io2j878
               cdi.kubevirt.io/storage.import.diskId: 8737b4f7-2b6b-4801-abc9-e307f838b337
               cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api
               cdi.kubevirt.io/storage.import.importPodName: importer-v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
               cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iossbwr
               cdi.kubevirt.io/storage.import.source: imageio
               cdi.kubevirt.io/storage.pod.phase: Pending
               cdi.kubevirt.io/storage.pod.restarts: 0
               volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Mounted By:    importer-v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
Events:
  Type     Reason                Age                 From                                                                                                               Message
  ----     ------                ----                ----                                                                                                               -------
  Normal   Provisioning          96s (x14 over 25m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ppt5j_74c6c775-5839-4035-b4fc-60c11a76557d  External provisioner is provisioning volume for claim "default/v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337"
  Warning  ProvisioningFailed    96s (x14 over 25m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ppt5j_74c6c775-5839-4035-b4fc-60c11a76557d  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes
  Normal   ExternalProvisioning  8s (x105 over 25m)  persistentvolume-controller                                                                                        waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator


** Notice that Access Modes is empty.

@Adam,
Do you see any reason for the different behavior seen on CNV-2.4.1, for the Thin provision?
Is there anything blocking this bug from getting fixed for CNV-2.5 please?

Comment 4 Alexander Wels 2020-08-27 15:39:45 UTC

This is the problem: Warning  ProvisioningFailed    96s (x14 over 25m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ppt5j_74c6c775-5839-4035-b4fc-60c11a76557d  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes

You are using RWX with file system volume mode, which is not allowed in ceph, either use RWO with file system or RWX with block.

Comment 5 Alexander Wels 2020-08-27 15:55:40 UTC

I need to make sure I say it correctly when I say which is not allowed in ceph. I mean which is not allowed with ceph rbd, other backends do allow it.

Comment 6 Ilanit Stein 2020-08-27 18:11:13 UTC

Thanks Alexander Wels, 

$ oc get pvc <pvc name> -o yaml (output right bellow)
show that the access mode in the request is ReadWriteMany,
while in UI VM import wizard, under Disk advanced, it shows it is set to RWO. ("Edit_disk" screenshot attached).
The ReadWriteMany setting comes from the source VM likely.

We seem to have 2 "VM import" issues here:
1. The problem Alex indicated: We try to do RWX on file system volume mode with ceph-rbd, and this cannot be done. 
2. UI Access mode settings (RWO) is not honored, as the actual access mode in the request is different (RWM).

$ oc get pvc v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.io2j878
    cdi.kubevirt.io/storage.import.diskId: 8737b4f7-2b6b-4801-abc9-e307f838b337
    cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api
    cdi.kubevirt.io/storage.import.importPodName: importer-v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
    cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iossbwr
    cdi.kubevirt.io/storage.import.source: imageio
    cdi.kubevirt.io/storage.pod.phase: Pending
    cdi.kubevirt.io/storage.pod.restarts: "0"
    volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
  creationTimestamp: "2020-08-26T05:59:35Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: containerized-data-importer
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:cdi.kubevirt.io/storage.import.certConfigMap: {}
          f:cdi.kubevirt.io/storage.import.diskId: {}
          f:cdi.kubevirt.io/storage.import.endpoint: {}
          f:cdi.kubevirt.io/storage.import.importPodName: {}
          f:cdi.kubevirt.io/storage.import.secretName: {}
          f:cdi.kubevirt.io/storage.import.source: {}
          f:cdi.kubevirt.io/storage.pod.phase: {}
          f:cdi.kubevirt.io/storage.pod.restarts: {}
        f:labels:
          .: {}
          f:app: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"f883836f-09ee-42d2-9858-b015a5af69b6"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:accessModes: {}
        f:resources:
          f:requests:
            .: {}
            f:storage: {}
        f:storageClassName: {}
        f:volumeMode: {}
      f:status:
        f:phase: {}
    manager: virt-cdi-controller
    operation: Update
    time: "2020-08-26T05:59:35Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:volume.beta.kubernetes.io/storage-provisioner: {}
    manager: kube-controller-manager
    operation: Update
    time: "2020-08-26T05:59:39Z"
  name: v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
  namespace: default
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
    uid: f883836f-09ee-42d2-9858-b015a5af69b6
  resourceVersion: "837470"
  selfLink: /api/v1/namespaces/default/persistentvolumeclaims/v2v-fedora32-8737b4f7-2b6b-4801-abc9-e307f838b337
  uid: 4548a434-c4f9-4748-83cb-00734b5a7dae
spec:
  accessModes:
  - ReadWriteMany    <------
  resources:
    requests:
      storage: 4Gi
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Filesystem
status:
  phase: Pending

Comment 7 Ilanit Stein 2020-08-27 18:17:54 UTC

Created attachment 1712866 [details]
Edit_Disk screenshot

Comment 8 Piotr Kliczewski 2020-08-28 08:11:47 UTC

@Alex, we set accessMode based on source vm settings. https://github.com/kubevirt/vm-import-operator/blob/f1e0efa6c53a71fe629ad4e64f1a45a1b34aa57e/pkg/providers/ovirt/mapper/mapper.go#L235

We use ReadWriteMany only when placement policy is set to VMAFFINITY_MIGRATABLE. I think that ReadWriteOnce would prevent us to live migrate vms in the future. Am I right?

Comment 9 Alexander Wels 2020-08-28 11:30:15 UTC

That is correct, RWX is required for live migration. So with ceph rbd the volume mode has to be block in order for live migration to work.

Comment 10 Piotr Kliczewski 2020-08-28 11:32:06 UTC

Thank you, I think we can move this BZ to vm-import and block such vms.

Comment 11 Ilanit Stein 2020-08-30 05:10:59 UTC

Opened this bug, for removing the "Access Mode" option from UI:
Bug 1873779 - [v2v][RHV to CNV VM import] Remove "Edit Disk": "Access Mode"

Comment 12 Maayan Hadasi 2020-09-02 12:11:04 UTC

I tried on two different envs to import non_migratable VM from RHV to CNV, which means- VM placement_policy was set to <affinity>pinned</affinity>
The import failed with this error:
The virtual machine could not be imported.
DataVolumeCreationFailed: Error while importing disk image: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314. pod CrashLoopBackoff restart exceeded

The import seems to be started but around 10% conversion it failed, pvc & conversion pod are automatically deleted (the commands output below was gathered before import failed)


Test scenario:
1. Edit the RHV source VM with:
Migration mode: Do not allow migration (in 'Host' tab)
This option will prevent VM migration between RHV hosts and will set via rest api: placement_policy to be 'pinned'
2. Import with Wizard this non-migratable VM using Ceph-RBD, filesystem is default in 2.4.1 and accessMode should be set to RWO as the VM is non_migratable
* source VM disk type: thin provision


More Info:

$ oc describe pvc v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
Name:          v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
Namespace:     default
StorageClass:  ocs-storagecluster-ceph-rbd
Status:        Bound
Volume:        pvc-c4982245-709f-424f-bed6-3470fbfb4d8d
Labels:        app=containerized-data-importer
Annotations:   cdi.kubevirt.io/storage.condition.running: false
               cdi.kubevirt.io/storage.condition.running.message:
                 back-off 20s restarting failed container=importer pod=importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314_default(a8596...
               cdi.kubevirt.io/storage.condition.running.reason: CrashLoopBackOff
               cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.iokkvrb
               cdi.kubevirt.io/storage.import.diskId: 4d0fd178-a83b-4d32-ad53-da560c410314
               cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api
               cdi.kubevirt.io/storage.import.importPodName: importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
               cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iowbkzs
               cdi.kubevirt.io/storage.import.source: imageio
               cdi.kubevirt.io/storage.pod.phase: Running
               cdi.kubevirt.io/storage.pod.restarts: 2
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      40Mi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
Events:
  Type     Reason                 Age                From                                                                                                               Message
  ----     ------                 ----               ----                                                                                                               -------
  Normal   ExternalProvisioning   49s (x2 over 49s)  persistentvolume-controller                                                                                        waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning           49s                openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ktlwf_b5e957a8-497c-4149-a63d-d2d5b8f4b72d  External provisioner is provisioning volume for claim "default/v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314"
  Normal   ProvisioningSucceeded  48s                openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-8c87b76ff-ktlwf_b5e957a8-497c-4149-a63d-d2d5b8f4b72d  Successfully provisioned volume pvc-c4982245-709f-424f-bed6-3470fbfb4d8d
  Warning  ErrImportFailed        30s (x4 over 32s)  import-controller                                                                                                  Unable to process data: write /data/disk.img: no space left on device
  Warning  ErrImportFailed        2s (x6 over 17s)   import-controller                                                                                                  Unable to connect to imageio data source: Fault reason is "Operation Failed". Fault detail is "[Cannot transfer Virtual Disk: The following disks are locked: GlanceDisk-aa51d20_v2v_cirros_vm_non_migratable. Please try again in a few minutes.]". HTTP response code is "409". HTTP response message is "409 Conflict".


$ oc get pvc -oyaml
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    annotations:
      cdi.kubevirt.io/storage.condition.running: "false"
      cdi.kubevirt.io/storage.condition.running.message: back-off 40s restarting failed
        container=importer pod=importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314_default(a8596ada-a6a8-4dc6-b3b4-999cadb3f095)
      cdi.kubevirt.io/storage.condition.running.reason: CrashLoopBackOff
      cdi.kubevirt.io/storage.import.certConfigMap: vmimport.v2v.kubevirt.iokkvrb
      cdi.kubevirt.io/storage.import.diskId: 4d0fd178-a83b-4d32-ad53-da560c410314
      cdi.kubevirt.io/storage.import.endpoint: https://rhev-blue-01.rdu2.scalelab.redhat.com/ovirt-engine/api
      cdi.kubevirt.io/storage.import.importPodName: importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
      cdi.kubevirt.io/storage.import.secretName: vmimport.v2v.kubevirt.iowbkzs
      cdi.kubevirt.io/storage.import.source: imageio
      cdi.kubevirt.io/storage.pod.phase: Running
      cdi.kubevirt.io/storage.pod.restarts: "3"
      pv.kubernetes.io/bind-completed: "yes"
      pv.kubernetes.io/bound-by-controller: "yes"
      volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
    creationTimestamp: "2020-09-02T11:50:16Z"
    finalizers:
    - kubernetes.io/pvc-protection
    labels:
      app: containerized-data-importer
    managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:pv.kubernetes.io/bind-completed: {}
            f:pv.kubernetes.io/bound-by-controller: {}
            f:volume.beta.kubernetes.io/storage-provisioner: {}
        f:spec:
          f:volumeName: {}
        f:status:
          f:accessModes: {}
          f:capacity:
            .: {}
            f:storage: {}
          f:phase: {}
      manager: kube-controller-manager
      operation: Update
      time: "2020-09-02T11:50:19Z"
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:cdi.kubevirt.io/storage.condition.running: {}
            f:cdi.kubevirt.io/storage.condition.running.message: {}
            f:cdi.kubevirt.io/storage.condition.running.reason: {}
            f:cdi.kubevirt.io/storage.import.certConfigMap: {}
            f:cdi.kubevirt.io/storage.import.diskId: {}
            f:cdi.kubevirt.io/storage.import.endpoint: {}
            f:cdi.kubevirt.io/storage.import.importPodName: {}
            f:cdi.kubevirt.io/storage.import.secretName: {}
            f:cdi.kubevirt.io/storage.import.source: {}
            f:cdi.kubevirt.io/storage.pod.phase: {}
            f:cdi.kubevirt.io/storage.pod.restarts: {}
          f:labels:
            .: {}
            f:app: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"78ca742c-2e4f-4f33-9f5c-35ab5e5a57be"}:
              .: {}
              f:apiVersion: {}
              f:blockOwnerDeletion: {}
              f:controller: {}
              f:kind: {}
              f:name: {}
              f:uid: {}
        f:spec:
          f:accessModes: {}
          f:resources:
            f:requests:
              .: {}
              f:storage: {}
          f:volumeMode: {}
      manager: virt-cdi-controller
      operation: Update
      time: "2020-09-02T11:51:32Z"
    name: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
    namespace: default
    ownerReferences:
    - apiVersion: cdi.kubevirt.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: DataVolume
      name: v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
      uid: 78ca742c-2e4f-4f33-9f5c-35ab5e5a57be
    resourceVersion: "13111878"
    selfLink: /api/v1/namespaces/default/persistentvolumeclaims/v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314
    uid: c4982245-709f-424f-bed6-3470fbfb4d8d
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: "41127936"
    storageClassName: ocs-storagecluster-ceph-rbd
    volumeMode: Filesystem
    volumeName: pvc-c4982245-709f-424f-bed6-3470fbfb4d8d
  status:
    accessModes:
    - ReadWriteOnce
    capacity:
      storage: 40Mi
    phase: Bound
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



@Piotr, @Alex, can you please shed some light on this?

Comment 13 Alexander Wels 2020-09-02 12:17:50 UTC

The PVC size seems suspiciously small 40Mi. 

Capacity:      40Mi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314

Comment 14 Maayan Hadasi 2020-09-02 12:25:38 UTC

(In reply to Alexander Wels from comment #13)
> The PVC size seems suspiciously small 40Mi. 
> 
> Capacity:      40Mi
> Access Modes:  RWO
> VolumeMode:    Filesystem
> Mounted By:   
> importer-v2vcirrosvmnonmigratable-4d0fd178-a83b-4d32-ad53-da560c410314

We use this VM in our v2v tests. I can retry with a bigger one

Comment 15 Alexander Wels 2020-09-02 12:44:36 UTC

So the original source VM disk is only 40Mi? The import is failing on "Unable to process data: write /data/disk.img: no space left on device" which indicates there is no space left to import. With CDI we have had a long struggle with file system overhead on imports. For instance with ceph rbd, it uses an xfs file system on the PV. The original PV is created on a block device of the requested size. The FS will have some overhead. On a really small PV like that, the overhead of the FS will likely be a significant percentage of the total size, which is likely why you are running out of space.

Comment 16 Maayan Hadasi 2020-09-02 14:03:10 UTC

Retested with two other non_migratable VMs, using ceph-rbd, on CNV 2.4.1 (the scenario described in comment #12 was also on CNV 2.4.1):
- VM disk size: 4 GiB, thin provision - same import error: pod CrashLoopBackoff restart exceeded and same messages in "describe pvc" output as described in comment #12
- VM disk size: 25 GiB, preallocated - same.

Comment 17 Piotr Kliczewski 2020-09-03 07:56:55 UTC

As far as I understand filesystem with ceph-rbd is not supported in 2.5. I think we should target this bug to future release when it will be supported so we can make sure vm-import is handling it correctly.

In terms of pvc size I am not sure whether we can/should do anything about it. It seems like this is more generic issue and should be solved by CDI.
@Alex do you agree?

Comment 18 Alexander Wels 2020-09-03 13:15:44 UTC

I don't think RWX with ceph-rbd and file system will ever be supported, there would have to be a file system that allows simultaneous writes from different locations on the block device, like cephfs. CDI cannot solve the problem of the PVC being smaller than the size of the data being written. In general we recommend making the PVC 5-10% larger than the actual data you are writing to it for overhead purposes. I am not entirely sure where the size of the PVC is determined, I am assuming from the source VM definition? We are building a mechanism for the cluster admins to specify the overhead per storage class, once that is available you should be able to read the value from the CDIConfig object. Until that is available I suggest the importer to just add like 5-10% extra to the size compared to how it determines it now.

Comment 19 Piotr Kliczewski 2020-09-03 13:41:47 UTC

With this anyone who is using CDI would need to add the overhead. We are unable to get storage capabilities to figure out whether to add overhead or not. We have no ability to make this decision.

Comment 20 Alexander Wels 2020-09-03 13:48:18 UTC

CDI doesn't have the ability either, the user tells us, write data from this source into this PVC of this size. Then the kubernetes storage system creates the PVC based on the requested size. The problem is, not every provisioner does this the same way. The issue happens when the provisioner makes a block device of the requested size, and then puts a file system on top of it. Then the actual available size < requested size and we are left struggling with how to handle that. We can't do anything but ask the user to make a bigger PVC.

Comment 21 Piotr Kliczewski 2020-09-03 13:58:49 UTC

We are told to import a vm with disk sizes specified in source infrastructure (RHV or vmware). We can't offload this to the user this time the user do not allocate any pvcs.

Comment 22 Alexander Wels 2020-09-03 14:19:06 UTC

From CDIs perspective, the VM import is the user creating the PVCs. If you are creating a PVC of the exact size, and due to fs overhead there is not enough space available. I am not sure what we can do to solve it, if the disk is the entire size.

Comment 23 Piotr Kliczewski 2020-09-03 14:41:18 UTC

vm import is creating DVs not PVC and it doesn't know storage class capability to decide whether to add overhead or not. It has not enough information to make this decision.
We only know what is reported by the source infrastructure and use it to create DV(s).

Comment 24 Maayan Hadasi 2020-09-30 14:20:21 UTC

I filed RFE issue: https://bugzilla.redhat.com/show_bug.cgi?id=1883908 - [RFE][v2v][RHV to CNV VM import] VM Importer should request a slightly larger DV in order to overcome Filesystem overhead when using Ceph-RBD storage class

Thanks Alexander Wels for detailed explanation on what exactly is the issue we are facing with

Comment 25 Piotr Kliczewski 2020-10-05 07:52:00 UTC

We have no ability to tell when to add overhead. This BZ should be verified once the RFE is implemented that is why it was targeted to future release.
Updating dependency.

Comment 26 Piotr Kliczewski 2020-10-13 11:18:18 UTC

*** Bug 1885304 has been marked as a duplicate of this bug. ***

Comment 27 Fabien Dupont 2021-01-12 19:57:40 UTC

@awels do you have the link to the CDI PR that takes care of the filesystem overhead? IIRC, this is already done, and this BZ should be moved to ON_QA.

Comment 28 Alexander Wels 2021-01-15 13:12:39 UTC

We have a PR that allows an admin to specify the overhead on a global or per storage class basis. The VM importer could use this information once it determines which storage class it is going to use to increase the requested PV size by that percentage. https://github.com/kubevirt/containerized-data-importer/pull/1319 is the PR implementing the overhead logic. The CDIConfig object will contain the information you guys need.

Comment 29 Sam Lucidi 2021-01-29 20:18:30 UTC

https://github.com/kubevirt/vm-import-operator/pull/469

Comment 31 Fabien Dupont 2021-02-02 08:12:08 UTC

The PR has been merged and the d/s automated checks are green. The change should be in hco-bundle-registry-container-v2.6.0-532 and onwards.

Comment 32 Ilanit Stein 2021-02-15 07:33:04 UTC

Migratable VM + Filesystem + Ceph-RBD is not allowed. We need to block it/Fail it with an error, saying the user can either change the source VM to be non migratable, or use volumeMode: Block.

Comment 33 Fabien Dupont 2021-02-15 07:58:11 UTC

What do you mean by "is not allowed" ?

The problem reported in this BZ was that when using Ceph-RBD + Filesystem the overhead for the filesystem was not accounted for, which made the migration fail.

MTV will not let you use it though, since it was failing in VMIO. We have implemented Provisioner CRs exactly for that purpose and the only allowed mode today is Ceph-RBD + Block. But that's only in MTV.

Comment 35 Ilanit Stein 2021-02-15 09:14:37 UTC

This bug includes 2 issues:
1) Not accounting for the filesystem overhead.
2) Allowing live migratable VM to be imported with Ceph-RBD + Filesystem.

As the last fix is for 1), changing the bug title to match it.

Comment 36 Amos Mastbaum 2021-02-15 10:52:41 UTC

import rhv to cnv using cephs/filesystem RWO works (1)

Comment 37 Amos Mastbaum 2021-02-15 10:53:05 UTC

import rhv to cnv using cephs/filesystem RWO works (1)

Comment 38 Ilanit Stein 2021-02-15 14:09:34 UTC

filed this bug for issue 2) in comment #35:
Bug 1928767 - [VM import from RHV to CNV] Migratable VM import (RWX) to Ceph-RBD/Filesystem should be blocked

Comment 41 errata-xmlrpc 2021-03-10 11:18:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Note You need to log in before you can comment on or make changes to this bug.