Bug 1976730
| Summary: | Disk is not usable due to incorrect size for proper alignment | |||
|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Germano Veit Michel <gveitmic> | |
| Component: | Storage | Assignee: | Alexander Wels <awels> | |
| Status: | CLOSED ERRATA | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 2.6.5 | CC: | awels, cnv-qe-bugs, dholler, fdeutsch, kgershon, mrashish, nrevo, tmicheli, yadu | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | virt-cdi-operator:v4.9.0-14, CNV v4.9.0-86 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1990559 1990560 (view as bug list) | Environment: | ||
| Last Closed: | 2021-11-02 15:59:33 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1990559, 1990560 | |||
I ran into this issue with OCP 4.7 + ODF 4.7 + OCP Virt ?. I am using Ceph RDB from ODF/OCS and requested 50GiB. (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2021-07-08T20:19:47.296918Z qemu-kvm: -device ide-hd,bus=ide.2,drive=libvirt-1-format,id=ua-rootdisk-install,bootindex=2,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment')" Report from 02964811 mentions that adjusting the filesystemOverhead to zero did not work. Getting more data. (In reply to Nate Revo from comment #2) > I ran into this issue with OCP 4.7 + ODF 4.7 + OCP Virt ?. I am using Ceph > RDB from ODF/OCS and requested 50GiB. > > (combined from similar events): server error. command SyncVMI failed: > "LibvirtError(Code=1, Domain=10, Message='internal error: process exited > while connecting to monitor: 2021-07-08T20:19:47.296918Z qemu-kvm: -device > ide-hd,bus=ide.2,drive=libvirt-1-format,id=ua-rootdisk-install,bootindex=2, > write-cache=on: Cannot get 'write' permission without 'resize': Image size > is not a multiple of request alignment')" Do you know what is the IO size of the underlying storage? It's not 512B is it? You can try to recreate the disk manually as below just to confirm its kubevirt creating a disk with incorrect size for your storage. Make sure at step 10 you use a size that aligns with the IO of your storage (I think with ceph it should be a multiple of 4194304 bytes). ~~~ 1. Find the VM: $ oc get vmi NAME AGE PHASE IP NODENAME rhel7-blushing-aphid 39h Running 10.129.2.47 openshift-worker-0 rhel7-usual-weasel 9s Running 10.129.3.247 openshift-worker-0 2. Find the pod may will fail quickly (~5m) as the VM fails to start, so we probably need to be quick. $ oc get pods | grep weasel virt-launcher-rhel7-usual-weasel-4llrq 1/1 Running 0 19s 3. Debug the pod (this will carbon copy it) $ oc debug virt-launcher-rhel7-usual-weasel-4llrq 4. Go to the UI and stop/shutdown the VM (the debug pod should still be alive after that, we are doing it to avoid simultaneous access to the disks and locking failures) 5. Back on the debug pod, enter this directory: sh-4.4# cd /var/run/kubevirt-private/vmi-disks/ 6. Stat and qemu-img all disks (each in a directory), and also collect mount output sh-4.4# for d in `ls`; do echo $d; qemu-img info $d/disk.img; stat $d/disk.img; done sh-4.4# mount 7. Manually try to do qemu-io against each (replace the path below for each file), the error should reproduce on the problematic disk. sh-4.4# qemu-io -f raw -t none /var/run/kubevirt-private/vmi-disks/disk-0/disk.img qemu-io> quit NOTE: we just want to know if the prompt opens, above is a sucessful run against the disk, move to the next. 8. If it fails to open continue with the rest of the plan below. 9. CD into the directory of the disk that failed to open above, and do a qemu-img info: sh-4.4# cd <directory of the failed disk, i.e: disk-0> sh-4.4# qemu-img info disk.img 10. Delete the file and create a new one manually, using the size in bytes from the output of the command above, adjusting it properly for alignment with the underlying storage. sh-4.4# rm disk.img sh-4.4# qemu-img create -f raw disk.img <ALIGNED SIZE HERE> ~~~ How reproducible:
Always
Steps to Reproduce:
1. Prepare 4K aligned fake block storage
# fallocate -l 10GiB /exports/iscsi
# losetup -b 4096 /exports/iscsi
2. Add it as an iscsi target
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 1]
| | o- iscsi ......................................................................... [/dev/loop0 (10.0GiB) write-thru activated]
| | o- alua ................................................................................................... [ALUA Groups: 1]
| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
o- iscsi ............................................................................................................ [Targets: 2]
| o- iqn.2003-01.org.linux-iscsi.fedora.x8664:sn.7898909edacd .......................................................... [TPGs: 1]
| | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| | o- acls .......................................................................................................... [ACLs: 1]
| | | o- iqn.2003-01.org.linux-iscsi:shift .................................................................... [Mapped LUNs: 1]
| | | o- mapped_lun0 ................................................................................. [lun1 block/iscsi (rw)]
| | o- luns .......................................................................................................... [LUNs: 2]
| | | o- lun0 .................................................................... [block/iscsi (/dev/loop0) (default_tg_pt_gp)]
| | o- portals .................................................................................................... [Portals: 1]
3. Use it as a PV on OCP
kind: PersistentVolume
apiVersion: v1
metadata:
name: iscsi-4k
spec:
capacity:
storage: 10Gi
iscsi:
targetPortal: '192.168.200.254:3260'
iqn: 'iqn.2003-01.org.linux-iscsi.fedora.x8664:sn.7898909edacd'
lun: 0
iscsiInterface: default
fsType: ext4
initiatorName: 'iqn.2003-01.org.linux-iscsi:shift'
accessModes:
- ReadWriteOnce
4. Create VM disk with it
5. Run VM
2021-07-14T01:12:26.055417Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.6,addr=0x0,drive=libvirt-1-format,id=ua-disk-0,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment
6. Inside the pod, we see the disk as sda, see its 4K
sh-4.4# lsblk -o name,phy-sec,log-sec,alignment,min-io,opt-io,mountpoint
NAME PHY-SEC LOG-SEC ALIGNMENT MIN-IO OPT-IO MOUNTPOINT
sda 4096 4096 0 4096 10485760 /run/kubevirt-private/vmi-disks/disk-0
7. Reproduce the error:
sh-4.4# qemu-io -f raw -t none /run/kubevirt-private/vmi-disks/disk-0/disk.img
qemu-io: can't open device /run/kubevirt-private/vmi-disks/disk-0/disk.img: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment
8. Check size
sh-4.4# qemu-img info /run/kubevirt-private/vmi-disks/disk-0/disk.img
image: /run/kubevirt-private/vmi-disks/disk-0/disk.img
file format: raw
virtual size: 7.56 GiB (8117488128 bytes)
disk size: 4 KiB
9. Size does not end on storage alignment boundary:
8117488128 / 4096 = 1981808.625
10. Delete disk.ing and create a new file with correct size for alignment = VM runs
Can I get a copy of the ENTIRE virt-launcher pod. We have seen the resize message after other failures in certain cases and I want to rule that out first. ENTIRE virt-launcher pod logs I mean Yup, attaching it now. By the way, latest customer report suggests setting filesystemOverhead to zero worked for newly created disks. Okay, looking at the logs, the first thing that happens is the error with the alignment. So no other conclusion can be drawn than that is the problem. As you noted this is hard coded, and we would want to somehow make this configureable as there seems to be a large variety of block sizes. or maybe we can look it up with fstat. One more thing that might be the problem. Is the VM using cloud init, there is a bug that got fixed with the alignment of the cloud init image that is generated. (In reply to Alexander Wels from comment #11) > Okay, looking at the logs, the first thing that happens is the error with > the alignment. So no other conclusion can be drawn than that is the problem. > As you noted this is hard coded, and we would want to somehow make this > configureable as there seems to be a large variety of block sizes. or maybe > we can look it up with fstat. Or just do the rounding based on an arbitrary big one (i.e. 32M) instead of 512, so that all smaller multiples are automatically aligned too. Verified with the following code:
---------------------------------------------
oc version
Client Version: 4.9.0-202107292313.p0.git.1557476.assembly.stream-1557476
Server Version: 4.9.0-0.nightly-2021-08-04-025616
Kubernetes Version: v1.21.1+8268f88
[cnv-qe-jenkins@stg10-kevin-jdtkw-executor kev]$ oc get csv -n openshift-cnv
NAME DISPLAY VERSION REPLACES PHASE
kubevirt-hyperconverged-operator.v4.9.0 OpenShift Virtualization 4.9.0 kubevirt-hyperconverged-operator.v2.6.5 Succeeded
Verified with the following scenario:
---------------------------------------------
1. Connect to a host on the cluster:
oc debug node/stg10-kevin-jdtkw-worker-0-7f69w
sh-4.4# chroot /host/
2. Prepare 4K aligned fake block storage
fallocate -l 10GiB /var/iscsi
losetup -f -b 4096 /var/iscsi
losetup -l
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop1
0 0 0 0 /var/iscsi 0 4096
3. Install the filesystem and mount the device:
mkfs.ext4 -b 4096 /var/iscsi
mkdir /var/hpvolumes/4k
mount /dev/loop1 /var/hpvolumes/4k/
4. Label the device:
ls -ltrZ /var/hpvolumes/4k
total 16
drwx------. 2 root root system_u:object_r:unlabeled_t:s0 16384 Aug 19 15:30 lost+found
sudo chcon -t container_file_t -R /var/hpvolumes/4k
ls -ltrZ /var/hpvolumes/4k
total 16
drwx------. 2 root root system_u:object_r:container_file_t:s0 16384 Aug 19 15:30 lost+found
5. Create a special pv pointing to the loopback device created: (see yaml below)
6. Create the vm: (see yaml below)
oc create -f cirros-vm.yaml
virtualmachine.kubevirt.io/vm-cirros-dv-2 created
7. oc get dv -w
NAME PHASE PROGRESS RESTARTS AGE
cirros-dv-2 Succeeded 100.0% 9s
8. Start vm:
oc get vm
NAME AGE STATUS READY
vm-cirros-dv-2 28s Stopped False
virtctl start vm-cirros-dv-2
VM vm-cirros-dv-2 was scheduled to start
oc get vmi
NAME AGE PHASE IP NODENAME READY
vm-cirros-dv-2 17s Running 10.128.2.178 stg10-kevin-jdtkw-worker-0-7f69w True
[cnv-qe-jenkins@stg10-kevin-jdtkw-executor kev]$ virtctl console vm-cirros-dv-2
Successfully connected to vm-cirros-dv-2 console. The escape sequence is ^]
>>>>> VMI is running and vm can be accessed
Moving to verified!
Additional information:
---------------------------------------------------------------------------------
special pv yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-pv-volume
labels:
type: local
spec:
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- stg10-kevin-jdtkw-worker-0-7f69w
storageClassName: test-sc
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/var/hpvolumes/4k"
vm yaml:
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: vm-cirros-dv-2
name: vm-cirros-dv-2
spec:
dataVolumeTemplates:
- metadata:
name: cirros-dv-2
spec:
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100M
storageClassName: test-sc
source:
http:
url: "http://xxx/xxx/xxx/cirros-0.4.0-x86_64-disk.qcow2"
running: false
template:
metadata:
labels:
kubevirt.io/vm: vm-datavolume
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: datavolumevolume
machine:
type: ""
resources:
requests:
memory: 64M
terminationGracePeriodSeconds: 0
volumes:
- dataVolume:
name: cirros-dv-2
name: datavolumevolume
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104 |
Description of problem: I think this is a bug so raising one with what we have, as the customer has not replied with additional info we requested for troubleshooting. The customer is using IBM Spectrum Scale Storage, with 1MiB block size. If they create a new disk and attach to a VM using that storage, it fails to start because its not aligned. {"component":"virt-launcher","level":"error","msg":"internal error: qemu unexpectedly closed the monitor: 2021-06-15T15:16:06.788431Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.6,addr=0x0,drive=li bvirt-1-format,id=ua-disk-0,write-cache=on: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment","pos":"qemuProcessReportLogError:2111","subcomponent":"libvirt","thread":"290","timestamp":"2021-06-15T15:16:06.811000Z"} Here: https://github.com/qemu/qemu/blob/8e6dad2028d01b7f9ec76cf3b83457fab57fa1eb/block.c#L2309 Our current theory is the "fs overhead" doing a round down based on a hardcoded block size (512), here: https://github.com/kubevirt/containerized-data-importer/blob/2014ddecfd39ba69fe75970ce613f31c32d2b182/pkg/importer/data-processor.go#L364 So it reduces the size by the overhead % and then rounds it down to 512 block size multiple. This may fail to set the file size correctly if the block size is not 512. The customer on this case was 1MiB. How reproducible: Unknown (I tried on NFS but it worked, not the right scenario though)