1834646 – qemu-img convert abort when converting image with unaligned size

Bug 1834646 - qemu-img convert abort when converting image with unaligned size

Summary: qemu-img convert abort when converting image with unaligned size

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	8.3
Assignee:	Kevin Wolf
QA Contact:	Xueqiang Wei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1834281
TreeView+	depends on / blocked

Reported:	2020-05-12 06:47 UTC by Xueqiang Wei
Modified:	2023-02-27 14:28 UTC (History)
CC List:	10 users (show)
Fixed In Version:	qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1834281
Environment:
Last Closed:	2020-11-17 17:48:34 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Xueqiang Wei 2020-05-12 06:47:17 UTC

+++ This bug was initially created as a clone of Bug #1834281 +++

Description of problem:

# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)



Version-Release number of selected component (if applicable):

kernel-4.18.0-194.el8.x86_64
qemu-kvm-4.2.0-19.module+el8.3.0+6478+69f490bb
qemu-img-4.2.0-19.module+el8.3.0+6478+69f490bb


How reproducible:
100%

Steps to Reproduce:
1. # mount -t nfs -o soft,vers=4.2 10.66.61.132:/home/nfs_server/ /home/kvm_autotest_root/images/
2. # cd /home/kvm_autotest_root/images/
3. # truncate -s 11136 test.img
4. # qemu-io -c 'write -P 1 0 10K' test.img -f raw
5. # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc


Actual results:
after step5:
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)

Expected results:
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
    (100.00/100%)



Additional info:
It works well on rhel8.1.0 slow train and rhel8.1.0 fast train.
It doesn't work on rhel.2.0 slow train and rhel8.2.0 fast train.

Details:
1. according to https://bugzilla.redhat.com/show_bug.cgi?id=1678979#c19, it had been fixed on rhel8.1.0 slow train.

Tested with qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0, it works well.
# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 0.0346 sec (288.218 KiB/sec and 28.8218 ops/sec)
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
    (100.00/100%)

2. according to https://bugzilla.redhat.com/show_bug.cgi?id=1588356#c16, it had been fixed on rhel8.1.0 fast train.

Tested with qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf, it works well.
# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 0.0278 sec (359.557 KiB/sec and 35.9557 ops/sec)
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
    (100.00/100%)

3. Tested on rhel8.2.0 slow train, it doesn't work.

Tested with qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c, qemu core dumped.

# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 0.0453 sec (220.386 KiB/sec and 22.0386 ops/sec)
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:1646: bdrv_aligned_pwritev: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)

4. Tested on rhel8.2.0 fast train, it doesn't work.

Tested with qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950, qemu core dumped

# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 00.04 sec (236.864 KiB/sec and 23.6864 ops/sec)
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)

--- Additional comment from Xueqiang Wei on 2020-05-11 13:27:22 UTC ---

core dumped log:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1834281/gdb.txt

core dumped file:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1834281/core.qemu-img.0.5b438c5e682d41f9ab1da8d09025a9eb.135324.1589199068000000.lz4

Comment 1 Xueqiang Wei 2020-05-12 06:54:09 UTC

Hit it on rhel8.3 fast train

Versions:
kernel-4.18.0-194.el8.x86_64
qemu-kvm-5.0.0-0.scrmod+el8.3.0+6495+1936fa11.wrb200506

# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 00.04 sec (247.384 KiB/sec and 24.7384 ops/sec)
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: /builddir/build/BUILD/qemu-5.0.0/block/io.c:1887: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)



Also hit it on rhel8.2.1 fast train - qemu-kvm-4.2.0-21.module+el8.2.1+6586+8b7713b9
# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 00.04 sec (266.294 KiB/sec and 26.6294 ops/sec)
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
Aborted (core dumped)

Comment 2 John Ferlan 2020-05-15 14:37:55 UTC

Adjusting dependency order - fix goes into RHEL AV first, then RHEL.

Comment 3 John Ferlan 2020-05-19 20:43:36 UTC

Assigned to Ademar for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Not sure if this would be Kevin or Max. If fix goes into RHEL AV 8.2.1, then bug 1834281 would pick up change when rebase occurs

Comment 4 Xueqiang Wei 2020-05-20 07:29:38 UTC

Also hit it on the latest package.

Versions:
kernel-4.18.0-200.el8.x86_64
qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420

Comment 6 Kevin Wolf 2020-05-25 14:51:38 UTC

This regression seems to be related to upstream commit a6b257a08e3 ('file-posix: Handle undetectable alignment') by Nir.

It seems that the algorithm used for detecting the required O_DIRECT alignment fails on NFS (because NFS is fine with  byte alignment) and therefore chooses the safe default of 4k. Creating the target image rounds the image size up to full 512 bytes, but accessing the image in 4k granularity means that we'll still write past the end of the created image file.

We need to find a way to deal with this situation: Either automatically round the target image size up to a multiple of the request alignment (though then we wouldn't be creating an exact copy any more!) or error out. For NFS specifically, it would be good to find a way to figure out that byte alignment is actually what is needed. Without a kernel interface that just tells us the right alignment, I'm afraid our probing code can never be 100% reliable.

Comment 7 Nir Soffer 2020-05-25 15:36:47 UTC

(In reply to Kevin Wolf from comment #6)
In oVirt this is not an issue, since we enforce 4k alignment in all volumes.
There is no way to create a volume which is not aligned to 4k, regardless of
the storage.

> # truncate -s 11136 test.img

What is the use case for creating a volume with size < 4k?

This change was added to allow 4k storage in RHHI.

The chance that we will have kernel interface reporting the required alignment
and that it will be exposed via fuse/gluster (or other storage) is low, and it
will take years to get there.

I think rounding the image up to request_alignment is the way to go, making
such errors impossible.

Comment 10 Xueqiang Wei 2020-08-13 13:23:53 UTC

Tested with qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901, not hit this issue. So set status to VERIFIED.

Versions:
kernel-4.18.0-232.el8.x86_64
qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901


1. # mount -t nfs -o soft,vers=4.2 10.66.61.132:/home/nfs_server/ /home/nfs_test

2. # cd /home/nfs_test

3. # truncate -s 11136 test.img

4. # qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 00.05 sec (207.017 KiB/sec and 20.7017 ops/sec)

5. # qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
(100.00/100%)


Automation result:
(1/1) Host_RHEL.m8.u3.product_av.raw.virtio_blk.up.virtio_net.Guest.RHEL.8.3.0.x86_64.io-github-autotest-qemu.qemu_img_convert_image_with_unaligned_size.q35: PASS (2.89 s)

Comment 13 errata-xmlrpc 2020-11-17 17:48:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137

Comment 14 Yi Song 2023-02-26 07:48:43 UTC

Hi,

Does this bug impact kvm run time disk access method?

As https://access.redhat.com/support/cases/#/case/03068601
We have a raw file on NFS as below, and kvm open the file with O_DIRECT flag.
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/ef05afb0863beae8492ff11d98795fc7/volume-fd0f5fd7-f115-4863-9b72-de4b766ac0b3'/>
      <target dev='vdc' bus='virtio'/>
      <shareable/>
      <serial>fd0f5fd7-f115-4863-9b72-de4b766ac0b3</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

We found when run fio sequential 5k block write test in VM, host has extra read IO, equal or a little more than write IOPS.

[root@overcloud-novacompute-20 ~]# nfsiostat 10 /var/lib/nova/mnt/ef05afb0863beae8492ff11d98795fc7
xxx.xx.xx.xx:/vol1_dedup mounted on /var/lib/nova/mnt/ef05afb0863beae8492ff11d98795fc7:

           ops/s       rpc bklog
        2166.800           0.000

read:              ops/s            kB/s           kB/op         retrans    avg RTT (ms)    avg exe (ms)  avg queue (ms)          errors
                1177.400        7643.735           6.492        0 (0.0%)           0.176           0.232           0.039        0 (0.0%)
write:             ops/s            kB/s           kB/op         retrans    avg RTT (ms)    avg exe (ms)  avg queue (ms)          errors
                 979.000        8504.083           8.686        0 (0.0%)           0.227           0.273           0.032        0 (0.0%)

As this bug behavior, qemu set 4K alignment size for O_DIRECT file on NFS. Then, when VM is not writing in 4K block, qemu has to read back 4K block from NFS server, made modification, then write back the 4K block, right?


Thanks

Comment 15 Kevin Wolf 2023-02-27 10:42:51 UTC

Yes, this is precisely what happens when QEMU incorrectly infers a 4k alignment for NFS shares. I expect that this problem goes away when you update to a more recent version that contains the fix for this bug report.

Comment 16 Yi Song 2023-02-27 14:28:41 UTC

Thanks for the confirm.

We met the problem on qemu-kvm-4.2.0-29.module+el8.2.1+11280+70ae3d73.8.x86_64.
And the problem is gone after upgrade to qemu-kvm-5.1.0-14.module+el8.3.0+8438+644aff69.x86_64

Note You need to log in before you can comment on or make changes to this bug.