Description of problem: I think this is a bug so raising one with what we have, as the customer has not replied with additional info we requested for troubleshooting. The customer is using IBM Spectrum Scale Storage, with 1MiB block size. If they create a new disk and attach to a VM using that storage, it fails to start because its not aligned. {"component":"virt-launcher","level":"error","msg":"internal error: qemu unexpectedly closed the monitor: 2021-06-15T15:16:06.788431Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.6,addr=0x0,drive=li bvirt-1-format,id=ua-disk-0,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment","pos":"qemuProcessReportLogError:2111","subcomponent":"libvirt","thread":"290","timestamp":"2021-06-15T15:16:06.811000Z"} Here: https://github.com/qemu/qemu/blob/8e6dad2028d01b7f9ec76cf3b83457fab57fa1eb/block.c#L2309 Our current theory is the "fs overhead" doing a round down based on a hardcoded block size (512), here: https://github.com/kubevirt/containerized-data-importer/blob/2014ddecfd39ba69fe75970ce613f31c32d2b182/pkg/importer/data-processor.go#L364 So it reduces the size by the overhead % and then rounds it down to 512 block size multiple. This may fail to set the file size correctly if the block size is not 512. The customer on this case was 1MiB. How reproducible: Unknown (I tried on NFS but it worked, not the right scenario though)
I ran into this issue with OCP 4.7 + ODF 4.7 + OCP Virt ?. I am using Ceph RDB from ODF/OCS and requested 50GiB. (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2021-07-08T20:19:47.296918Z qemu-kvm: -device ide-hd,bus=ide.2,drive=libvirt-1-format,id=ua-rootdisk-install,bootindex=2,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment')"
Report from 02964811 mentions that adjusting the filesystemOverhead to zero did not work. Getting more data.
(In reply to Nate Revo from comment #2) > I ran into this issue with OCP 4.7 + ODF 4.7 + OCP Virt ?. I am using Ceph > RDB from ODF/OCS and requested 50GiB. > > (combined from similar events): server error. command SyncVMI failed: > "LibvirtError(Code=1, Domain=10, Message='internal error: process exited > while connecting to monitor: 2021-07-08T20:19:47.296918Z qemu-kvm: -device > ide-hd,bus=ide.2,drive=libvirt-1-format,id=ua-rootdisk-install,bootindex=2, > write-cache=on: Cannot get 'write' permission without 'resize': Image size > is not a multiple of request alignment')" Do you know what is the IO size of the underlying storage? It's not 512B is it? You can try to recreate the disk manually as below just to confirm its kubevirt creating a disk with incorrect size for your storage. Make sure at step 10 you use a size that aligns with the IO of your storage (I think with ceph it should be a multiple of 4194304 bytes). ~~~ 1. Find the VM: $ oc get vmi NAME AGE PHASE IP NODENAME rhel7-blushing-aphid 39h Running 10.129.2.47 openshift-worker-0 rhel7-usual-weasel 9s Running 10.129.3.247 openshift-worker-0 2. Find the pod may will fail quickly (~5m) as the VM fails to start, so we probably need to be quick. $ oc get pods | grep weasel virt-launcher-rhel7-usual-weasel-4llrq 1/1 Running 0 19s 3. Debug the pod (this will carbon copy it) $ oc debug virt-launcher-rhel7-usual-weasel-4llrq 4. Go to the UI and stop/shutdown the VM (the debug pod should still be alive after that, we are doing it to avoid simultaneous access to the disks and locking failures) 5. Back on the debug pod, enter this directory: sh-4.4# cd /var/run/kubevirt-private/vmi-disks/ 6. Stat and qemu-img all disks (each in a directory), and also collect mount output sh-4.4# for d in `ls`; do echo $d; qemu-img info $d/disk.img; stat $d/disk.img; done sh-4.4# mount 7. Manually try to do qemu-io against each (replace the path below for each file), the error should reproduce on the problematic disk. sh-4.4# qemu-io -f raw -t none /var/run/kubevirt-private/vmi-disks/disk-0/disk.img qemu-io> quit NOTE: we just want to know if the prompt opens, above is a sucessful run against the disk, move to the next. 8. If it fails to open continue with the rest of the plan below. 9. CD into the directory of the disk that failed to open above, and do a qemu-img info: sh-4.4# cd <directory of the failed disk, i.e: disk-0> sh-4.4# qemu-img info disk.img 10. Delete the file and create a new one manually, using the size in bytes from the output of the command above, adjusting it properly for alignment with the underlying storage. sh-4.4# rm disk.img sh-4.4# qemu-img create -f raw disk.img <ALIGNED SIZE HERE> ~~~
How reproducible: Always Steps to Reproduce: 1. Prepare 4K aligned fake block storage # fallocate -l 10GiB /exports/iscsi # losetup -b 4096 /exports/iscsi 2. Add it as an iscsi target o- / ......................................................................................................................... [...] o- backstores .............................................................................................................. [...] | o- block .................................................................................................. [Storage Objects: 1] | | o- iscsi ......................................................................... [/dev/loop0 (10.0GiB) write-thru activated] | | o- alua ................................................................................................... [ALUA Groups: 1] | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized] o- iscsi ............................................................................................................ [Targets: 2] | o- iqn.2003-01.org.linux-iscsi.fedora.x8664:sn.7898909edacd .......................................................... [TPGs: 1] | | o- tpg1 ............................................................................................... [no-gen-acls, no-auth] | | o- acls .......................................................................................................... [ACLs: 1] | | | o- iqn.2003-01.org.linux-iscsi:shift .................................................................... [Mapped LUNs: 1] | | | o- mapped_lun0 ................................................................................. [lun1 block/iscsi (rw)] | | o- luns .......................................................................................................... [LUNs: 2] | | | o- lun0 .................................................................... [block/iscsi (/dev/loop0) (default_tg_pt_gp)] | | o- portals .................................................................................................... [Portals: 1] 3. Use it as a PV on OCP kind: PersistentVolume apiVersion: v1 metadata: name: iscsi-4k spec: capacity: storage: 10Gi iscsi: targetPortal: '192.168.200.254:3260' iqn: 'iqn.2003-01.org.linux-iscsi.fedora.x8664:sn.7898909edacd' lun: 0 iscsiInterface: default fsType: ext4 initiatorName: 'iqn.2003-01.org.linux-iscsi:shift' accessModes: - ReadWriteOnce 4. Create VM disk with it 5. Run VM 2021-07-14T01:12:26.055417Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.6,addr=0x0,drive=libvirt-1-format,id=ua-disk-0,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment 6. Inside the pod, we see the disk as sda, see its 4K sh-4.4# lsblk -o name,phy-sec,log-sec,alignment,min-io,opt-io,mountpoint NAME PHY-SEC LOG-SEC ALIGNMENT MIN-IO OPT-IO MOUNTPOINT sda 4096 4096 0 4096 10485760 /run/kubevirt-private/vmi-disks/disk-0 7. Reproduce the error: sh-4.4# qemu-io -f raw -t none /run/kubevirt-private/vmi-disks/disk-0/disk.img qemu-io: can't open device /run/kubevirt-private/vmi-disks/disk-0/disk.img: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment 8. Check size sh-4.4# qemu-img info /run/kubevirt-private/vmi-disks/disk-0/disk.img image: /run/kubevirt-private/vmi-disks/disk-0/disk.img file format: raw virtual size: 7.56 GiB (8117488128 bytes) disk size: 4 KiB 9. Size does not end on storage alignment boundary: 8117488128 / 4096 = 1981808.625 10. Delete disk.ing and create a new file with correct size for alignment = VM runs
Can I get a copy of the ENTIRE virt-launcher pod. We have seen the resize message after other failures in certain cases and I want to rule that out first.
ENTIRE virt-launcher pod logs I mean
Yup, attaching it now. By the way, latest customer report suggests setting filesystemOverhead to zero worked for newly created disks.
Okay, looking at the logs, the first thing that happens is the error with the alignment. So no other conclusion can be drawn than that is the problem. As you noted this is hard coded, and we would want to somehow make this configureable as there seems to be a large variety of block sizes. or maybe we can look it up with fstat.
One more thing that might be the problem. Is the VM using cloud init, there is a bug that got fixed with the alignment of the cloud init image that is generated.
(In reply to Alexander Wels from comment #11) > Okay, looking at the logs, the first thing that happens is the error with > the alignment. So no other conclusion can be drawn than that is the problem. > As you noted this is hard coded, and we would want to somehow make this > configureable as there seems to be a large variety of block sizes. or maybe > we can look it up with fstat. Or just do the rounding based on an arbitrary big one (i.e. 32M) instead of 512, so that all smaller multiples are automatically aligned too.
Verified with the following code: --------------------------------------------- oc version Client Version: 4.9.0-202107292313.p0.git.1557476.assembly.stream-1557476 Server Version: 4.9.0-0.nightly-2021-08-04-025616 Kubernetes Version: v1.21.1+8268f88 [cnv-qe-jenkins@stg10-kevin-jdtkw-executor kev]$ oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.9.0 OpenShift Virtualization 4.9.0 kubevirt-hyperconverged-operator.v2.6.5 Succeeded Verified with the following scenario: --------------------------------------------- 1. Connect to a host on the cluster: oc debug node/stg10-kevin-jdtkw-worker-0-7f69w sh-4.4# chroot /host/ 2. Prepare 4K aligned fake block storage fallocate -l 10GiB /var/iscsi losetup -f -b 4096 /var/iscsi losetup -l NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC /dev/loop1 0 0 0 0 /var/iscsi 0 4096 3. Install the filesystem and mount the device: mkfs.ext4 -b 4096 /var/iscsi mkdir /var/hpvolumes/4k mount /dev/loop1 /var/hpvolumes/4k/ 4. Label the device: ls -ltrZ /var/hpvolumes/4k total 16 drwx------. 2 root root system_u:object_r:unlabeled_t:s0 16384 Aug 19 15:30 lost+found sudo chcon -t container_file_t -R /var/hpvolumes/4k ls -ltrZ /var/hpvolumes/4k total 16 drwx------. 2 root root system_u:object_r:container_file_t:s0 16384 Aug 19 15:30 lost+found 5. Create a special pv pointing to the loopback device created: (see yaml below) 6. Create the vm: (see yaml below) oc create -f cirros-vm.yaml virtualmachine.kubevirt.io/vm-cirros-dv-2 created 7. oc get dv -w NAME PHASE PROGRESS RESTARTS AGE cirros-dv-2 Succeeded 100.0% 9s 8. Start vm: oc get vm NAME AGE STATUS READY vm-cirros-dv-2 28s Stopped False virtctl start vm-cirros-dv-2 VM vm-cirros-dv-2 was scheduled to start oc get vmi NAME AGE PHASE IP NODENAME READY vm-cirros-dv-2 17s Running 10.128.2.178 stg10-kevin-jdtkw-worker-0-7f69w True [cnv-qe-jenkins@stg10-kevin-jdtkw-executor kev]$ virtctl console vm-cirros-dv-2 Successfully connected to vm-cirros-dv-2 console. The escape sequence is ^] >>>>> VMI is running and vm can be accessed Moving to verified! Additional information: --------------------------------------------------------------------------------- special pv yaml: apiVersion: v1 kind: PersistentVolume metadata: name: test-pv-volume labels: type: local spec: nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - stg10-kevin-jdtkw-worker-0-7f69w storageClassName: test-sc capacity: storage: 10Gi accessModes: - ReadWriteOnce hostPath: path: "/var/hpvolumes/4k" vm yaml: apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: labels: kubevirt.io/vm: vm-cirros-dv-2 name: vm-cirros-dv-2 spec: dataVolumeTemplates: - metadata: name: cirros-dv-2 spec: pvc: accessModes: - ReadWriteOnce resources: requests: storage: 100M storageClassName: test-sc source: http: url: "http://xxx/xxx/xxx/cirros-0.4.0-x86_64-disk.qcow2" running: false template: metadata: labels: kubevirt.io/vm: vm-datavolume spec: domain: devices: disks: - disk: bus: virtio name: datavolumevolume machine: type: "" resources: requests: memory: 64M terminationGracePeriodSeconds: 0 volumes: - dataVolume: name: cirros-dv-2 name: datavolumevolume
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104