Bug 2059057

Summary: Volume import from RHV fails with error "Virtual image size is larger than available size" with Dell unity CSI driver
Product: Container Native Virtualization (CNV) Reporter: nijin ashok <nashok>
Component: StorageAssignee: Bartosz Rybacki <brybacki>
Status: CLOSED DUPLICATE QA Contact: Natalie Gavrielov <ngavrilo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9.2CC: alitke, cnv-qe-bugs, julien.rouxel, mrashish, yadu
Target Milestone: ---   
Target Release: 4.9.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: v4.9.4-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-31 15:51:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nijin ashok 2022-02-28 05:13:28 UTC
Description of problem:

In the "volumeMode: Filesystem", the Dell unity CSI driver formats the volume as ext4. The ext4 by default reserves some blocks and hence the total available blocks will be less than that of PVC size.

On a test pod with PVC size 100Gi, the total available space is around 92 GiB where reserved blocks are 5Gi.

~~~
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

bash-4.4$ df /mnt
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sde       102687672 61464  97366944   1% /mnt

# tune2fs -l /dev/sde 
tune2fs 1.45.6 (20-Mar-2020)
....
...
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              6553600
Block count:              26214400
Reserved block count:     1310720    <<<
Free blocks:              25656552
Free inodes:              6553589
First block:              0
Block size:               4096
Fragment size:            4096
~~~

In CDI, there is a check during the resize where it checks if the "available" size of the filesystem is greater than that of the virtual size of the image. Since we have this ext4 overhead here, the available space will be always less than that of PVC size. So the check will fail with the error below.

~~~
2022-02-24T14:35:11.327622639Z I0224 14:35:11.327552       1 util.go:604] Saving VDDK annotations from pod status message: messageUnable to process data: Virtual image size 107374182400 is larger than available size 99736776437 (PVC size 107374182400, reserved overhead 0.055000%). A larger PVC is required. Unable to resize disk image to requested size kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause    /remote-source/app/pkg/importer/data-processor.go:223 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData     /remote-source/app/pkg/importer/data-processor.go:166 main.main     /remote-source/app/cmd/cdi-importer/importer.go:189 runtime.main    /usr/lib/golang/src/runtime/proc.go:225 runtime.goexit  /usr/lib/golang/src/runtime/asm_amd64.s:1371
~~~

For the above case, the DV size was 100 Gi and the PVC was 105.82 Gi (adding the filesystem overhead).

~~~
dv.yaml |yq -y '.spec.storage'
accessModes:
  - ReadWriteOnce
resources:
  requests:
    storage: 100Gi   <<<<<
volumeMode: Filesystem

pvc.yaml |yq -y '.spec'
accessModes:
  - ReadWriteOnce
resources:
  requests:
    storage: '113623473440'   <<<<  (105.82 Gi).
volumeMode: Filesystem
~~~

However, the available space was only around 93 Gi and the virtual image size was 100 Gi and the import failed.


Version-Release number of selected component (if applicable):

v4.9.2

How reproducible:

100 % in the customer environment.

Steps to Reproduce:

1. Import a VM from RHV using MTV with storage class Dell unity with volumeMode: Filesystem.
2. The import fails with the error "Virtual image size is larger than available size"

Actual results:

Volume import from RHV fails with error "Virtual image size is larger than available size" with Dell unity CSI driver

Expected results:

I am not sure if this is a bug in CDI or Dell unity CSI driver. The volume provisioned using the Dell CSI driver is giving the available size less than that of PVC size. So CDI is actually reporting and blocking it correctly? 

Or do we need another knob that can be customized by the user to add this additional overhead to support these kinds of storage? Opening this bug to get a wider opinion.

Additional info:

Comment 1 Yan Du 2022-03-02 13:24:31 UTC
Maya, could you please take a look?

Comment 2 Adam Litke 2022-03-08 12:47:51 UTC
I wonder if this environment has unusually high filesystem overhead.  Could you try the following command to override the default fs overhead value and see if it helps the situation?

oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'containerizeddataimporter.kubevirt.io/jsonpatch=[{"op": "add", "path": "/spec/config/filesystemOverhead", "value": {}}, {"op": "add", "path": "/spec/config/filesystemOverhead/global", "value": "0.10"}]'


This will change the default value "0.055" (5.5%) to "0.10" (10%).

Comment 3 Maya Rashish 2022-03-20 20:54:19 UTC
setting the fs overhead to 0.0 should be a workaround that requires no modification

oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'containerizeddataimporter.kubevirt.io/jsonpatch=[{"op": "add", "path": "/spec/config/filesystemOverhead", "value": {}}, {"op": "add", "path": "/spec/config/filesystemOverhead/global", "value": "0.0"}]'

Comment 4 Maya Rashish 2022-03-21 15:47:00 UTC
Please disregard my comment, I had misunderstood the problem.

Comment 5 julien.rouxel 2022-03-24 13:23:44 UTC
Hi,

I'm the  customer who is affect by this issue

I tested the solution from Adam.

But result is worth than before

The error log of migration is

~~~
Unable to process data: Virtual image size 32212254720 is larger than available size 802433433 (PVC size 32212254720, reserved overhead 0.100000%). A larger PVC is required. Unable to resize disk image to requested size kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:223 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:166 main.main /remote-source/app/cmd/cdi-importer/importer.go:189 runtime.main /usr/lib/golang/src/runtime/proc.go:225 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1371
~~~

It's seems the value 0.100000% seems strange

I tried to change command line to specify 10 but i have this error
~~~
$ oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged 'containerizeddataimporter.kubevirt.io/jsonpatch=[{"op": "add", "path": "/spec/config/filesystemOverhead", "value": {}}, {"op": "add", "path": "/spec/config/filesystemOverhead/global", "value": "10"}]'
The CDI "cdi-kubevirt-hyperconverged" is invalid: spec.config.filesystemOverhead.global: Invalid value: "10": spec.config.filesystemOverhead.global in body should match '^(0(?:\.\d{1,3})?|1)$' 
~~~

Thanks

Comment 6 Adam Litke 2022-03-31 15:51:44 UTC

*** This bug has been marked as a duplicate of bug 2064936 ***