+++ This bug was initially created as a clone of Bug #1834281 +++ Description of problem: # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed. Aborted (core dumped) Version-Release number of selected component (if applicable): kernel-4.18.0-194.el8.x86_64 qemu-kvm-4.2.0-19.module+el8.3.0+6478+69f490bb qemu-img-4.2.0-19.module+el8.3.0+6478+69f490bb How reproducible: 100% Steps to Reproduce: 1. # mount -t nfs -o soft,vers=4.2 10.66.61.132:/home/nfs_server/ /home/kvm_autotest_root/images/ 2. # cd /home/kvm_autotest_root/images/ 3. # truncate -s 11136 test.img 4. # qemu-io -c 'write -P 1 0 10K' test.img -f raw 5. # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc Actual results: after step5: # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed. Aborted (core dumped) Expected results: # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc (100.00/100%) Additional info: It works well on rhel8.1.0 slow train and rhel8.1.0 fast train. It doesn't work on rhel.2.0 slow train and rhel8.2.0 fast train. Details: 1. according to https://bugzilla.redhat.com/show_bug.cgi?id=1678979#c19, it had been fixed on rhel8.1.0 slow train. Tested with qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0, it works well. # truncate -s 11136 test.img # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 0.0346 sec (288.218 KiB/sec and 28.8218 ops/sec) # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc (100.00/100%) 2. according to https://bugzilla.redhat.com/show_bug.cgi?id=1588356#c16, it had been fixed on rhel8.1.0 fast train. Tested with qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf, it works well. # truncate -s 11136 test.img # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 0.0278 sec (359.557 KiB/sec and 35.9557 ops/sec) # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc (100.00/100%) 3. Tested on rhel8.2.0 slow train, it doesn't work. Tested with qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c, qemu core dumped. # truncate -s 11136 test.img # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 0.0453 sec (220.386 KiB/sec and 22.0386 ops/sec) # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc qemu-img: block/io.c:1646: bdrv_aligned_pwritev: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed. Aborted (core dumped) 4. Tested on rhel8.2.0 fast train, it doesn't work. Tested with qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950, qemu core dumped # truncate -s 11136 test.img # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 00.04 sec (236.864 KiB/sec and 23.6864 ops/sec) # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed. Aborted (core dumped) --- Additional comment from Xueqiang Wei on 2020-05-11 13:27:22 UTC --- core dumped log: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1834281/gdb.txt core dumped file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1834281/core.qemu-img.0.5b438c5e682d41f9ab1da8d09025a9eb.135324.1589199068000000.lz4
Hit it on rhel8.3 fast train Versions: kernel-4.18.0-194.el8.x86_64 qemu-kvm-5.0.0-0.scrmod+el8.3.0+6495+1936fa11.wrb200506 # truncate -s 11136 test.img # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 00.04 sec (247.384 KiB/sec and 24.7384 ops/sec) # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc qemu-img: /builddir/build/BUILD/qemu-5.0.0/block/io.c:1887: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed. Aborted (core dumped) Also hit it on rhel8.2.1 fast train - qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9 # truncate -s 11136 test.img # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 00.04 sec (266.294 KiB/sec and 26.6294 ops/sec) # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed. Aborted (core dumped)
Adjusting dependency order - fix goes into RHEL AV first, then RHEL.
Assigned to Ademar for initial triage per bz process and age of bug created or assigned to virt-maint without triage. Not sure if this would be Kevin or Max. If fix goes into RHEL AV 8.2.1, then bug 1834281 would pick up change when rebase occurs
Also hit it on the latest package. Versions: kernel-4.18.0-200.el8.x86_64 qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420
This regression seems to be related to upstream commit a6b257a08e3 ('file-posix: Handle undetectable alignment') by Nir. It seems that the algorithm used for detecting the required O_DIRECT alignment fails on NFS (because NFS is fine with byte alignment) and therefore chooses the safe default of 4k. Creating the target image rounds the image size up to full 512 bytes, but accessing the image in 4k granularity means that we'll still write past the end of the created image file. We need to find a way to deal with this situation: Either automatically round the target image size up to a multiple of the request alignment (though then we wouldn't be creating an exact copy any more!) or error out. For NFS specifically, it would be good to find a way to figure out that byte alignment is actually what is needed. Without a kernel interface that just tells us the right alignment, I'm afraid our probing code can never be 100% reliable.
(In reply to Kevin Wolf from comment #6) In oVirt this is not an issue, since we enforce 4k alignment in all volumes. There is no way to create a volume which is not aligned to 4k, regardless of the storage. > # truncate -s 11136 test.img What is the use case for creating a volume with size < 4k? This change was added to allow 4k storage in RHHI. The chance that we will have kernel interface reporting the required alignment and that it will be exposed via fuse/gluster (or other storage) is low, and it will take years to get there. I think rounding the image up to request_alignment is the way to go, making such errors impossible.
Tested with qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901, not hit this issue. So set status to VERIFIED. Versions: kernel-4.18.0-232.el8.x86_64 qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901 1. # mount -t nfs -o soft,vers=4.2 10.66.61.132:/home/nfs_server/ /home/nfs_test 2. # cd /home/nfs_test 3. # truncate -s 11136 test.img 4. # qemu-io -c 'write -P 1 0 10K' test.img -f raw wrote 10240/10240 bytes at offset 0 10 KiB, 1 ops; 00.05 sec (207.017 KiB/sec and 20.7017 ops/sec) 5. # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc (100.00/100%) Automation result: (1/1) Host_RHEL.m8.u3.product_av.raw.virtio_blk.up.virtio_net.Guest.RHEL.8.3.0.x86_64.io-github-autotest-qemu.qemu_img_convert_image_with_unaligned_size.q35: PASS (2.89 s)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137
Hi, Does this bug impact kvm run time disk access method? As https://access.redhat.com/support/cases/#/case/03068601 We have a raw file on NFS as below, and kvm open the file with O_DIRECT flag. <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source file='/var/lib/nova/mnt/ef05afb0863beae8492ff11d98795fc7/volume-fd0f5fd7-f115-4863-9b72-de4b766ac0b3'/> <target dev='vdc' bus='virtio'/> <shareable/> <serial>fd0f5fd7-f115-4863-9b72-de4b766ac0b3</serial> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> We found when run fio sequential 5k block write test in VM, host has extra read IO, equal or a little more than write IOPS. [root@overcloud-novacompute-20 ~]# nfsiostat 10 /var/lib/nova/mnt/ef05afb0863beae8492ff11d98795fc7 xxx.xx.xx.xx:/vol1_dedup mounted on /var/lib/nova/mnt/ef05afb0863beae8492ff11d98795fc7: ops/s rpc bklog 2166.800 0.000 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 1177.400 7643.735 6.492 0 (0.0%) 0.176 0.232 0.039 0 (0.0%) write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 979.000 8504.083 8.686 0 (0.0%) 0.227 0.273 0.032 0 (0.0%) As this bug behavior, qemu set 4K alignment size for O_DIRECT file on NFS. Then, when VM is not writing in 4K block, qemu has to read back 4K block from NFS server, made modification, then write back the 4K block, right? Thanks
Yes, this is precisely what happens when QEMU incorrectly infers a 4k alignment for NFS shares. I expect that this problem goes away when you update to a more recent version that contains the fix for this bug report.
Thanks for the confirm. We met the problem on qemu-kvm-4.2.0-29.module+el8.2.1+11280+70ae3d73.8.x86_64. And the problem is gone after upgrade to qemu-kvm-5.1.0-14.module+el8.3.0+8438+644aff69.x86_64