Bug 2234374

Summary: Qemu Core Dumped When Writing Larger Size Than The Size of A Data Disk
Product: Red Hat Enterprise Linux 9 Reporter: Tingting Mao <timao>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED MIGRATED Docs Contact:
Severity: high    
Priority: high CC: aliang, chayang, coli, hreitz, jinzhao, juzhang, kwolf, mrezanin, qinwang, vgoyal, virt-maint, yfu, ymankad
Version: 9.4Keywords: CustomerScenariosInitiative, MigratedToJIRA, Regression, TestBlocker, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-22 16:31:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tingting Mao 2023-08-24 08:14:29 UTC
Description of problem:
As subject.


Version-Release number of selected component (if applicable):
qemu-kvm-8.1.0-0.rc4.el9.preview
kernel-5.14.0-340.el9.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Prepare a LV block file
# qemu-img create -f raw /home/lvm.img 60G
# losetup /dev/loop1 /home/lvm.img
# pvcreate /dev/loop1 --metadatasize=1m --metadatacopies=2 --metadataignore=y
# vgcreate vg /dev/loop1 --physicalextentsize=1m
# lvcreate --autobackup n --contiguous n  --size 1024M -n lv1 vg
# qemu-img create -f qcow2 /dev/vg/lv1 60G

2. Boot up a guest with the LV as a data disk
# /usr/libexec/qemu-kvm \
-S  \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' \
-blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' \
-blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio-scsi-ovmf_qcow2_filesystem_VARS.raw", "auto-read-only": true, "discard": "unmap"}' \
-blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' \
-machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,memory-backend=mem-machine_mem \
-device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' \
-device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}'  \
-nodefaults \
-device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' \
-m 30720 \
-object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}'  \
-smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
-cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
-chardev socket,wait=off,server=on,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_r9ni1_l3/monitor-qmpmonitor1-20230824-030703-TgoNPdRk  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,wait=off,server=on,id=qmp_id_catch_monitor,path=/var/tmp/avocado_r9ni1_l3/monitor-catch_monitor-20230824-030703-TgoNPdRk  \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device '{"ioport": 1285, "driver": "pvpanic", "id": "idSpXmJe"}' \
-chardev socket,wait=off,server=on,id=chardev_serial0,path=/var/tmp/avocado_r9ni1_l3/serial-serial0-20230824-030703-TgoNPdRk \
-device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}'  \
-chardev socket,id=seabioslog_id_20230824-030703-TgoNPdRk,path=/var/tmp/avocado_r9ni1_l3/seabios-20230824-030703-TgoNPdRk,server=on,wait=off \
-device isa-debugcon,chardev=seabioslog_id_20230824-030703-TgoNPdRk,iobase=0x402 \
-device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' \
-device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' \
-device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' \
-device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' \
-device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' \
-blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' \
-device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' \
-blockdev '{"node-name": "file_stg1", "driver": "host_device", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/dev/vg/lv1", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '{"node-name": "drive_stg1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg1"}' \
-device '{"driver": "scsi-hd", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "rerror": "stop", "werror": "stop", "serial": "TARGET_DISK0"}' \
-device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
-device '{"driver": "virtio-net-pci", "mac": "9a:c4:b4:6c:7c:ec", "id": "id9xdoY8", "netdev": "idXoIpNF", "bus": "pcie-root-port-3", "addr": "0x0"}'  \
-netdev tap,id=idXoIpNF,vhost=on  \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio \
-device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}'

3. Write data to the data disk in guest
(guest)# dd if=/dev/urandom of=/dev/sdb bs=1M count=50000 oflag=direct


Actual results:
After about several sec/1 minute, the qemu coredumped
(qemu) qemu.sh: line 46: 20707 Floating point exception(core dumped) /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -blockdev '{"node-name": "file_ovmf_code", "driver": "file", "filename": "/usr/share/OVMF/OVMF_CODE.secboot.fd", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_code", "driver": "raw", "read-only": true, "file": "file_ovmf_code"}' -blockdev '{"node-name": "file_ovmf_vars", "driver": "file", "filename": "/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel930-64-virtio-scsi-ovmf_qcow2_filesystem_VARS.raw", "auto-read-only": true, "discard": "unmap"}' -blockdev '{"node-name": "drive_ovmf_vars", "driver": "raw", "read-only": false, "file": "file_ovmf_vars"}' -machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars,memory-backend=mem-machine_mem -device '{"id": "pcie-root-port-0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x1", "chassis": 1}' -device '{"id": "pcie-pci-bridge-0", "driver": "pcie-pci-bridge", "addr": "0x0", "bus": "pcie-root-port-0"}' -nodefaults -device '{"driver": "VGA", "bus": "pcie.0", "addr": "0x2"}' -m 30720 -object '{"size": 32212254720, "id": "mem-machine_mem", "qom-type": "memory-backend-ram"}' -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt -chardev socket,wait=off,server=on,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_r9ni1_l3/monitor-qmpmonitor1-20230824-030703-TgoNPdRk -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,wait=off,server=on,id=qmp_id_catch_monitor,path=/var/tmp/avocado_r9ni1_l3/monitor-catch_monitor-20230824-030703-TgoNPdRk -mon chardev=qmp_id_catch_monitor,mode=control -device '{"ioport": 1285, "driver": "pvpanic", "id": "idSpXmJe"}' -chardev socket,wait=off,server=on,id=chardev_serial0,path=/var/tmp/avocado_r9ni1_l3/serial-serial0-20230824-030703-TgoNPdRk -device '{"id": "serial0", "driver": "isa-serial", "chardev": "chardev_serial0"}' -chardev socket,id=seabioslog_id_20230824-030703-TgoNPdRk,path=/var/tmp/avocado_r9ni1_l3/seabios-20230824-030703-TgoNPdRk,server=on,wait=off -device isa-debugcon,chardev=seabioslog_id_20230824-030703-TgoNPdRk,iobase=0x402 -device '{"id": "pcie-root-port-1", "port": 1, "driver": "pcie-root-port", "addr": "0x1.0x1", "bus": "pcie.0", "chassis": 2}' -device '{"driver": "qemu-xhci", "id": "usb1", "bus": "pcie-root-port-1", "addr": "0x0"}' -device '{"driver": "usb-tablet", "id": "usb-tablet1", "bus": "usb1.0", "port": "1"}' -device '{"id": "pcie-root-port-2", "port": 2, "driver": "pcie-root-port", "addr": "0x1.0x2", "bus": "pcie.0", "chassis": 3}' -device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-2", "addr": "0x0"}' -blockdev '{"node-name": "file_image1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel930-64-virtio-scsi-ovmf.qcow2", "cache": {"direct": true, "no-flush": false}}' -blockdev '{"node-name": "drive_image1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_image1"}' -device '{"driver": "scsi-hd", "id": "image1", "drive": "drive_image1", "write-cache": "on"}' -blockdev '{"node-name": "file_stg1", "driver": "host_device", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/dev/vg/lv1", "cache": {"direct": true, "no-flush": false}}' -blockdev '{"node-name": "drive_stg1", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg1"}' -device '{"driver": "scsi-hd", "id": "stg1", "drive": "drive_stg1", "write-cache": "on", "rerror": "stop", "werror": "stop", "serial": "TARGET_DISK0"}' -device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' -device '{"driver": "virtio-net-pci", "mac": "9a:c4:b4:6c:7c:ec", "id": "id9xdoY8", "netdev": "idXoIpNF", "bus": "pcie-root-port-3", "addr": "0x0"}' -netdev tap,id=idXoIpNF,vhost=on -vnc :0 -rtc base=utc,clock=host,driftfix=slew -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -monitor stdio -device '{"id": "pcie_extra_root_port_0", "driver": "pcie-root-port", "multifunction": true, "bus": "pcie.0", "addr": "0x3", "chassis": 5}'

                
Expected results:
No core dumped.


Additional info:
Stack trace of thread 20707:
                #0  0x0000557f25964609 get_zones_wp (qemu-kvm + 0x86b609)
                #1  0x0000557f25964b27 raw_co_prw (qemu-kvm + 0x86bb27)
                #2  0x0000557f2590a914 bdrv_driver_pwritev (qemu-kvm + 0x811914)
                #3  0x0000557f2590527d bdrv_aligned_pwritev (qemu-kvm + 0x80c27d)
                #4  0x0000557f259048d3 bdrv_co_pwritev_part (qemu-kvm + 0x80b8d3)
                #5  0x0000557f25942389 qcow2_co_pwritev_task (qemu-kvm + 0x849389)
                #6  0x0000557f25941e63 qcow2_co_pwritev_task_entry (qemu-kvm + 0x848e63)
                #7  0x0000557f2594163a qcow2_add_task (qemu-kvm + 0x84863a)
                #8  0x0000557f2593978c qcow2_co_pwritev_part (qemu-kvm + 0x84078c)
                #9  0x0000557f2590a7ea bdrv_driver_pwritev (qemu-kvm + 0x8117ea)
                #10 0x0000557f2590527d bdrv_aligned_pwritev (qemu-kvm + 0x80c27d)
                #11 0x0000557f259048d3 bdrv_co_pwritev_part (qemu-kvm + 0x80b8d3)
                #12 0x0000557f258ef8a6 blk_co_do_pwritev_part.llvm.4925601650272942953 (qemu-kvm + 0x7f68a6)
                #13 0x0000557f258f01d2 blk_aio_write_entry.llvm.4925601650272942953 (qemu-kvm + 0x7f71d2)
                #14 0x0000557f25ae0006 coroutine_trampoline.llvm.17812621179749689021 (qemu-kvm + 0x9e7006)
                #15 0x00007f4aa1e2a360 n/a (libc.so.6 + 0x2a360)
                #16 0x40a47d7e8f1e8300 n/a (n/a + 0x0)
                ELF object binary architecture: AMD x86-64

Comment 1 Tingting Mao 2023-08-24 08:18:47 UTC
Tried with qemu-kvm-8.0.0-12.el9, there is no qemu core dumped, and just hit 'No space left' hint info in the guest.
So mark this bug with regression.

Comment 2 aihua liang 2023-08-24 09:54:59 UTC
Test with qemu-kvm-8.1.0-0.el9.preview, also hit the same issue, with the same coredump info.
  Message: Process 30795 (qemu-kvm) of user 0 dumped core.
                
                Stack trace of thread 30795:
                #0  0x00005635ba703909 get_zones_wp (qemu-kvm + 0x86b909)
                #1  0x00005635ba703e27 raw_co_prw (qemu-kvm + 0x86be27)
                #2  0x00005635ba6a9c14 bdrv_driver_pwritev (qemu-kvm + 0x811c14)
                #3  0x00005635ba6a457d bdrv_aligned_pwritev (qemu-kvm + 0x80c57d)
                #4  0x00005635ba6a3bd3 bdrv_co_pwritev_part (qemu-kvm + 0x80bbd3)
                #5  0x00005635ba6e1689 qcow2_co_pwritev_task (qemu-kvm + 0x849689)
                #6  0x00005635ba6e1163 qcow2_co_pwritev_task_entry (qemu-kvm + 0x849163)
                #7  0x00005635ba6e093a qcow2_add_task (qemu-kvm + 0x84893a)
                #8  0x00005635ba6d8a8c qcow2_co_pwritev_part (qemu-kvm + 0x840a8c)
                #9  0x00005635ba6a9aea bdrv_driver_pwritev (qemu-kvm + 0x811aea)
                #10 0x00005635ba6a457d bdrv_aligned_pwritev (qemu-kvm + 0x80c57d)
                #11 0x00005635ba6a3bd3 bdrv_co_pwritev_part (qemu-kvm + 0x80bbd3)
                #12 0x00005635ba68eba6 blk_co_do_pwritev_part.llvm.8165632186031058405 (qemu-kvm + 0x7f6ba6)
                #13 0x00005635ba68f4d2 blk_aio_write_entry.llvm.8165632186031058405 (qemu-kvm + 0x7f74d2)
                #14 0x00005635ba87f306 coroutine_trampoline.llvm.6566130761695863925 (qemu-kvm + 0x9e7306)
                #15 0x00007fcbe6c2a360 n/a (libc.so.6 + 0x2a360)
                #16 0x0000000000000000 n/a (n/a + 0x0)
                ELF object binary architecture: AMD x86-64

Comment 3 Hanna Czenczek 2023-08-24 12:26:06 UTC
The problem is that get_zones_wp() and by extension update_zones_wp() expect zoning information to be present, but raw_co_prw()’s error path does not check whether it is before calling that function.

There is an upstream patch for this problem, but I’m not sure what its status is: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg01742.html – it reads like the author intended to send a different version, but I don’t think there ever was one (compare https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg05163.html).  I think I’ll send a separate version myself to get things going again, not least because there are other issues here:

There are various ways in which the presence of zoning information is checked (bs->wps != NULL and/or bs->bl.zone_size != 0), but judging from how raw_refresh_zoned_limits() is constructed, the only flag that is reliable is whether bs->bl.zoned != BLK_Z_NONE, so that should be changed everywhere, too.  (raw_refresh_zoned_limits() never clears bs->wps or bs->bl.zone_size if there is no zoning information on refresh, but it does set bs->bl.zoned to BLK_Z_NONE.)

Also, raw_refresh_zoned_limits() never clears anything on error, which does not seem right.  It should at least reset bs->bl.zoned to BLK_Z_NONE.

Comment 4 Hanna Czenczek 2023-08-25 09:06:32 UTC
Sent https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg04283.html upstream

Comment 9 Yanan Fu 2023-09-05 05:40:12 UTC
Update ~
The issue existing with the official downstream build: qemu-kvm-8.1.0-1.el9.
As this is a scenario in the qemu-kvm component gating test, add 'TestBlocker' accordingly.
Thanks!

Comment 14 Tingting Mao 2023-09-14 03:19:55 UTC
An easier way to reproduce the issue:

Tested with:
qemu-kvm-8.1.0-1.el9
kernel-5.14.0-362.el9


Steps:
#qemu-img create -f raw test.img 400M
#losetup /dev/loop0 test.img
#pvcreate /dev/loop0 
#vgcreate test /dev/loop0
#lvcreate -n top --size 128M test
# qemu-img create -f qcow2 /dev/test/top 128M
# qemu-img create -f qcow2 top.img -F qcow2 -b /dev/test/top
# qemu-io -f qcow2 top.img -c "write 0 128M"
# qemu-img commit -f qcow2 -t none -b /dev/test/top -d -p top.img
Floating point exception (core dumped)


Stack trace of thread 903888:
                #0  0x0000557af5b28aa9 get_zones_wp (qemu-img + 0x117aa9)
                #1  0x0000557af5b28fc7 raw_co_prw (qemu-img + 0x117fc7)
                #2  0x0000557af5acf484 bdrv_driver_pwritev (qemu-img + 0xbe484)
                #3  0x0000557af5ac9dad bdrv_aligned_pwritev (qemu-img + 0xb8dad)
                #4  0x0000557af5ac9403 bdrv_co_pwritev_part (qemu-img + 0xb8403)
                #5  0x0000557af5b06e29 qcow2_co_pwritev_task (qemu-img + 0xf5e29)
                #6  0x0000557af5b06903 qcow2_co_pwritev_task_entry (qemu-img + 0xf5903)
                #7  0x0000557af5b06107 qcow2_add_task (qemu-img + 0xf5107)
                #8  0x0000557af5afe22c qcow2_co_pwritev_part (qemu-img + 0xed22c)
                #9  0x0000557af5acf35a bdrv_driver_pwritev (qemu-img + 0xbe35a)
                #10 0x0000557af5ac9dad bdrv_aligned_pwritev (qemu-img + 0xb8dad)
                #11 0x0000557af5ac9403 bdrv_co_pwritev_part (qemu-img + 0xb8403)
                #12 0x0000557af5ab4aa6 blk_co_do_pwritev_part.llvm.8165632186031058405 (qemu-img + 0xa3aa6)
                #13 0x0000557af5ad5e59 mirror_read_complete (qemu-img + 0xc4e59)
                #14 0x0000557af5ad57ae mirror_co_read (qemu-img + 0xc47ae)
                #15 0x0000557af5bdcd16 coroutine_trampoline.llvm.6566130761695863925 (qemu-img + 0x1cbd16)
                #16 0x00007f012242a360 n/a (libc.so.6 + 0x2a360)
                #17 0x00007f0100000002 n/a (n/a + 0x0)
                ELF object binary architecture: AMD x86-64

Comment 16 RHEL Program Management 2023-09-22 16:29:05 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.