Bug 2072932

Summary: Qemu coredump when refreshing block limits on an actively used iothread block device [rhel.8.7]
Product: Red Hat Enterprise Linux 8 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aliang, coli, hreitz, jinzhao, juzhang, kkiwi, ngu, qinwang, qzhang, virt-maint, yfu, zhencliu
Version: 8.7Keywords: Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-15.module+el8.7.0+15644+189a21f6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1879437 Environment:
Last Closed: 2022-11-08 09:19:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1879437    
Bug Blocks:    

Comment 1 qing.wang 2022-04-07 10:17:12 UTC
There are two test scenarios to reproduce this issue 
According to https://bugzilla.redhat.com/show_bug.cgi?id=1879437#c19

Test scenarios 1 : run test script based on QSD.


#!/bin/sh

QEMU_IMG=qemu-img
QSD=qemu-storage-daemon

rm -f /tmp/qsd.pid

"$QEMU_IMG" create -f qcow2 -F raw -b null-co:// /tmp/top.qcow2

(echo '{"execute": "qmp_capabilities"}'
 sleep 1
 while true; do
     echo '{"execute": "blockdev-add", "arguments": {"driver": "qcow2", "node-name": "tmp", "backing": "node0", "file": {"driver": "file", "filename": "/tmp/top.qcow2"}}}'
     echo '{"execute": "blockdev-del", "arguments": {"node-name": "tmp"}}'
 done) | \
"$QSD" \
    --chardev stdio,id=stdio \
    --monitor mon0,chardev=stdio \
    --object iothread,id=iothread0 \
    --blockdev null-co,node-name=node0,read-zeroes=true \
    --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock \
    --export nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true \
    --pidfile /tmp/qsd.pid \
    &

while [ ! -f /tmp/qsd.pid ]; do
    true
done

"$QEMU_IMG" bench -f raw -c 2000000 nbd+unix:///node0\?socket=/tmp/nbd.sock
ret=$?

kill %1

exit $ret

==================================================================
The QSD crash on

#0  0x000055bfbe32f3c0 in bdrv_co_block_status (bs=bs@entry=0x55bfbf4c2bf0, 
    want_zero=want_zero@entry=true, offset=offset@entry=30031872, bytes=bytes@entry=4096, 
    pnum=pnum@entry=0x7f31eadedf18, map=map@entry=0x0, file=0x0) at ../block/io.c:2457
#1  0x000055bfbe332191 in bdrv_co_common_block_status_above (bs=bs@entry=0x55bfbf4c2bf0, 
    base=base@entry=0x0, include_base=include_base@entry=false, 
    want_zero=want_zero@entry=true, offset=offset@entry=30031872, bytes=bytes@entry=4096, 
    pnum=0x7f31eadedf18, map=0x0, file=0x0, depth=0x7f31eadedd34) at ../block/io.c:2655
#2  0x000055bfbe305522 in bdrv_common_block_status_above (bs=0x55bfbf4c2bf0, 
    base=base@entry=0x0, include_base=include_base@entry=false, 
    want_zero=want_zero@entry=true, offset=offset@entry=30031872, bytes=bytes@entry=4096, 
    pnum=0x7f31eadedf18, map=0x0, file=0x0, depth=0x0) at block/block-gen.c:444
#3  0x000055bfbe332414 in bdrv_block_status_above (bs=<optimized out>, base=base@entry=0x0, 
    offset=offset@entry=30031872, bytes=bytes@entry=4096, pnum=pnum@entry=0x7f31eadedf18, 
    map=map@entry=0x0, file=0x0) at ../block/io.c:2731
#4  0x000055bfbe2f8055 in nbd_co_send_sparse_read (errp=0x7f31eadedf10, size=4096, 
    data=0x7f31f001c000 "", offset=30031872, handle=<optimized out>, client=0x55bfbf4c9400)
    at ../nbd/server.c:1991
#5  nbd_do_cmd_read (errp=0x7f31eadedf10, data=<optimized out>, request=<synthetic pointer>, 
    client=0x55bfbf4c9400) at ../nbd/server.c:2425
#6  nbd_handle_request (errp=0x7f31eadedf10, data=<optimized out>,

coredump file:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/2072932/2022-04-07/core.qemu-storage-da.0.f61d9cfc23da45f0a1891909d845e815.69705.1649319528000000.lz4



Test scenarios 2 : run qemu with repeatedly execute blockdev-add/blockdev-del.

1.create backing file
qemu-img create -f qcow2 /home/kvm_autotest_root/images/stg1.qcow2 2G
qemu-img create -f qcow2 -b /home/kvm_autotest_root/images/stg1.qcow2 -F qcow2 /home/kvm_autotest_root/images/scratch1.img

2. boot vm 
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -device pvpanic,ioport=0x505,id=idZcGD6F  \
    -object iothread,id=iothread0 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-4,addr=0x0,iothread=iothread0 \
    \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel860-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3,iothread=iothread0,bootindex=0,werror=stop,rerror=stop \
    \
    -blockdev node-name=data_image1,driver=file,cache.direct=on,cache.no-flush=off,filename=/home/kvm_autotest_root/images/stg1.qcow2,aio=threads \
    -blockdev node-name=data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=data_image1 \
    -device virtio-blk-pci,id=disk1,drive=data1,write-cache=on,bus=pcie.0-root-port-7,iothread=iothread0,werror=stop,rerror=stop \
    \
    \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:55:56:57:58:59,id=id18Xcuo,netdev=idGRsMas,bus=pcie.0-root-port-5,addr=0x0  \
    -netdev tap,id=idGRsMas,vhost=on \
    -m 4096  \
    -smp 24,maxcpus=24,cores=12,threads=1,sockets=2  \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :5  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:5955,server,nowait \


3. run fio on data disk
fio --direct=1 --name=x --filename=/dev/vdb --size=2g --rw=randrw

4.execute blockdev-add/del backing node repeatedly 

{"execute":"blockdev-add","arguments":{"driver":"qcow2","node-name":"tmp1","file":{"driver":"file","filename":"/home/kvm_autotest_root/images/scratch1.img"},"backing":"data1"}}

{"execute": "blockdev-del", "arguments": {"node-name": "tmp1"}}


It will get event like following and vm get pause
{"timestamp": {"seconds": 1649317868, "microseconds": 197464}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "data1", "reason": "Invalid argument", "operation": "write", "action": "stop"}}

Comment 3 Hanna Czenczek 2022-05-30 12:15:23 UTC
Fix is upstream as 4d378bbd831bdd2f6e6adcd4ea5b77b6effaa627 (“block: Make bdrv_refresh_limits() non-recursive”).

Comment 7 qing.wang 2022-06-21 06:19:18 UTC
Passed on
Red Hat Enterprise Linux release 8.7 Beta (Ootpa)
4.18.0-398.el8.x86_64
qemu-kvm-6.2.0-15.module+el8.7.0+15644+189a21f6.x86_64
seabios-bin-1.16.0-2.module+el8.7.0+15506+033991b0.noarch
edk2-ovmf-20220126gitbb1bba3d77-2.el8.noarch


Test script:

#!/bin/sh

QEMU_IMG=qemu-img
QSD=qemu-storage-daemon

TMPD=/tmp/p291748
mkdir -p $TMPD
if ! which $QSD;then echo "$QSD does not exist";exit 1; fi

rm -f ${TMPD}/qsd.pid

"$QEMU_IMG" create -f qcow2 -F raw -b null-co:// ${TMPD}/top.qcow2

(echo '{"execute": "qmp_capabilities"}'
 sleep 1
 while true; do
     echo '{"execute": "blockdev-add", "arguments": {"driver": "qcow2", "node-name": "tmp", "backing": "node0", "file": {"driver": "file", "filename": "/tmp/p291748/top.qcow2"}}}'
     echo '{"execute": "blockdev-del", "arguments": {"node-name": "tmp"}}'
 done) | \
"$QSD" \
    --chardev stdio,id=stdio \
    --monitor mon0,chardev=stdio \
    --object iothread,id=iothread0 \
    --blockdev null-co,node-name=node0,read-zeroes=true \
    --nbd-server addr.type=unix,addr.path=${TMPD}/nbd.sock \
    --export nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true \
    --pidfile ${TMPD}/qsd.pid \
    &

while [ ! -f ${TMPD}/qsd.pid ]; do
    true
done

"$QEMU_IMG" bench -f raw -c 4000000 nbd+unix:///node0\?socket=${TMPD}/nbd.sock
ret=$?

kill %1

rm -rf ${TMPD}
exit $ret

Comment 12 errata-xmlrpc 2022-11-08 09:19:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7472