Bug 1997934

Summary: VM remains in paused state when trying to write on a disk resides on iscsi [rhel.9]
Product: Red Hat Enterprise Linux 9 Reporter: qing.wang <qinwang>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ahadas, berrange, bugs, bzlotnik, coli, eshames, hreitz, jinzhao, juzhang, kkiwi, kwolf, lsvaty, mrezanin, mtessun, nsoffer, pkrempa, qinwang, sshmulev, stefanha, virt-maint, ymankad
Version: 9.0Keywords: Automation, AutomationBlocker, Regression, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1994494 Environment:
Last Closed: 2022-05-17 12:23:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1994494    
Bug Blocks: 1996131, 2000568    

Comment 1 qing.wang 2021-08-26 06:26:55 UTC
Reproduce this bug on

Red Hat Enterprise Linux release 9.0 Beta (Plow)
5.14.0-0.rc6.46.el9.x86_64
qemu-kvm-common-6.0.0-12.el9.x86_64

It should have same reason https://bugzilla.redhat.com/show_bug.cgi?id=1994494


http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/1997934/2021-08-26/strace.log

Sometimes it will crash on writing qcow2 disk

#0  0x00007f5a1d9b8763 in pthread_kill.5 () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f5a19733640 (LWP 54119))]
(gdb) bt
#0  0x00007f5a1d9b8763 in pthread_kill.5 () from /lib64/libc.so.6
#1  0x00007f5a1d96b686 in raise () from /lib64/libc.so.6
#2  0x00007f5a1d9557d3 in abort () from /lib64/libc.so.6
#3  0x00007f5a1d9556fb in __assert_fail_base.cold () from /lib64/libc.so.6
#4  0x00007f5a1d9643a6 in __assert_fail () from /lib64/libc.so.6
#5  0x00005611e45a7634 in qemu_iovec_init_extended (qiov=<optimized out>, 
    head_buf=<optimized out>, head_len=<optimized out>, mid_qiov=<optimized out>, 
    mid_offset=140015594093200, mid_len=0, tail_buf=0x0, tail_len=0) at ../util/iov.c:428
#6  0x00005611e4422abe in qemu_iovec_init_slice (qiov=<optimized out>, source=0x7f5a10025e18, 
    offset=0, len=36864) at ../util/iov.c:515
#7  bdrv_driver_pwritev (bs=0x5611e5a0f400, offset=888500224, bytes=36864, qiov=0x7f5a10025e18, 
    qiov_offset=<optimized out>, flags=0) at ../block/io.c:1227
#8  0x00005611e442435c in bdrv_aligned_pwritev (child=0x5611e58c1c80, req=0x7f57ebafaf58, 
    offset=888500224, bytes=<optimized out>, align=<optimized out>, qiov=0x7f5a10025e18, 
    qiov_offset=0, flags=<optimized out>) at ../block/io.c:2089
#9  0x00005611e4423793 in bdrv_co_pwritev_part (child=<optimized out>, offset=<optimized out>, 
    bytes=<optimized out>, qiov=<optimized out>, qiov_offset=<optimized out>, 
    flags=<optimized out>) at ../block/io.c:2273
#10 0x00005611e441297a in qcow2_co_pwritev_task (bs=0x5611e5a136b0, host_offset=888500224, 
    offset=<optimized out>, bytes=<optimized out>, qiov=0x7f5a10025e18, qiov_offset=0, l2meta=0x0)
    at ../block/qcow2.c:2535
--Type <RET> for more, q to quit, c to continue without paging--c
#11 qcow2_co_pwritev_task_entry (task=<optimized out>) at ../block/qcow2.c:2565
#12 0x00005611e4428225 in aio_task_co (opaque=0x7f5a100116a0) at ../block/aio_task.c:45
#13 0x00005611e457b346 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173


Test steps:

1.build iscis target server
root@dell-per440-07 /home/tmp $ targetcli ls
o- / ........................................................................................... [...]
  o- backstores ................................................................................ [...]
  | o- block .................................................................... [Storage Objects: 0]
  | o- fileio ................................................................... [Storage Objects: 1]
  | | o- one ................................... [/home/iscsi/onex.img (15.0GiB) write-back activated]
  | |   o- alua ..................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ......................................... [ALUA state: Active/optimized]
  | o- pscsi .................................................................... [Storage Objects: 0]
  | o- ramdisk .................................................................. [Storage Objects: 0]
  o- iscsi .............................................................................. [Targets: 1]
  | o- iqn.2016-06.one.server:one-a ........................................................ [TPGs: 1]
  |   o- tpg1 ................................................................. [no-gen-acls, no-auth]
  |     o- acls ............................................................................ [ACLs: 2]
  |     | o- iqn.1994-05.com.redhat:clienta ......................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 .................................................... [lun0 fileio/one (rw)]
  |     | o- iqn.1994-05.com.redhat:clientb ......................................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 .................................................... [lun0 fileio/one (rw)]
  |     o- luns ............................................................................ [LUNs: 1]
  |     | o- lun0 ............................. [fileio/one (/home/iscsi/onex.img) (default_tg_pt_gp)]
  |     o- portals ...................................................................... [Portals: 1]
  |       o- 0.0.0.0:3260 ....................................................................... [OK]
  o- loopback ........................................................................... [Targets: 0]

2.attach iscsi disk on host
iscsiadm -m discovery -t st -p 127.0.0.1
iscsiadm -m node -T iqn.2016-06.one.server:one-a  -p 127.0.0.1:3260 -l

3.change the value to 64 on the attached disk

echo 64 > /sys/block/sdd/queue/max_sectors_kb

4.create lvms on the disk
pvcreate /dev/sdd
vgcreate vg /dev/sdd

lvcreate -L 2560M -n lv1 vg;lvcreate -L 2560M -n lv2 vg
lvcreate -L 2560M -n lv3 vg;lvcreate -L 2560M -n lv4 vg


qemu-img create -f qcow2 /dev/vg/lv1 2G;qemu-img create -f qcow2 /dev/vg/lv2 2G

qemu-img create -f qcow2 /dev/vg/lv3 2G;qemu-img create -f qcow2 /dev/vg/lv4 2G
 
5.boot the vm with two lvm as blk device



/usr/libexec/qemu-kvm \
  -name src_vm1 \
  -machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off \
  -m 8g \
  \
  -device pcie-root-port,id=pcie.0-root-port-2,slot=2,bus=pcie.0,multifunction=on \
  -device pcie-root-port,id=pcie.0-root-port-3,slot=3,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-4,slot=4,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-5,slot=5,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-6,slot=6,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-7,slot=7,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-8,slot=8,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-9,slot=9,bus=pcie.0 \
  -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -object iothread,id=iothread0 \
  -object iothread,id=iothread1 \
  -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0,iothread=iothread0 \
  -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \
  -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=drive_image1 \
  -device scsi-hd,id=os1,drive=drive_image1,bootindex=0 \
  \
   -blockdev node-name=host_device_stg,driver=host_device,aio=native,filename=/dev/vg/lv1,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg \
  -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-4,addr=0x0,write-cache=on,id=stg,drive=drive_stg,rerror=stop,werror=stop \
\
 -blockdev node-name=host_device_stg2,driver=host_device,aio=native,filename=/dev/vg/lv2,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg2 \
  -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-5,addr=0x0,write-cache=on,id=stg2,drive=drive_stg2,rerror=stop,werror=stop \
\
 -blockdev node-name=host_device_stg3,driver=host_device,aio=native,filename=/dev/vg/lv3,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg3,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg3 \
  -device scsi-hd,write-cache=on,id=stg3,drive=drive_stg3,rerror=stop,werror=stop \
\
 -blockdev node-name=host_device_stg4,driver=host_device,aio=native,filename=/dev/vg/lv4,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg4,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg4 \
  -device scsi-hd,write-cache=on,id=stg4,drive=drive_stg4,rerror=stop,werror=stop \
\
  \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -device virtio-net-pci,mac=9a:b5:b6:b1:b4:b5,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-6,addr=0x0 \
  -netdev tap,id=idxgXAlm


6.execute io on disks in guest

root@bootp-73-199-218 /home $ cat test.sh 
for i in $(seq 100); do
    printf "pass %03d ... " $i
    dd if=/dev/zero bs=1M count=2048 of=$1 conv=fsync status=none
    if [[ "$?" != "0" ]];then break;fi
    echo "ok"
  done

./test.sh /dev/vda
./test.sh /dev/vdb
./test.sh /dev/sdb
./test.sh /dev/sdc

Comment 4 Nir Soffer 2021-09-01 09:25:18 UTC
(In reply to qing.wang from comment #1)
...
> Sometimes it will crash on writing qcow2 disk
> 
> #0  0x00007f5a1d9b8763 in pthread_kill.5 () from /lib64/libc.so.6
> [Current thread is 1 (Thread 0x7f5a19733640 (LWP 54119))]
> (gdb) bt
> #0  0x00007f5a1d9b8763 in pthread_kill.5 () from /lib64/libc.so.6
> #1  0x00007f5a1d96b686 in raise () from /lib64/libc.so.6
> #2  0x00007f5a1d9557d3 in abort () from /lib64/libc.so.6
> #3  0x00007f5a1d9556fb in __assert_fail_base.cold () from /lib64/libc.so.6
> #4  0x00007f5a1d9643a6 in __assert_fail () from /lib64/libc.so.6
> #5  0x00005611e45a7634 in qemu_iovec_init_extended (qiov=<optimized out>, 
>     head_buf=<optimized out>, head_len=<optimized out>, mid_qiov=<optimized

This is assert() failure in qemu, it is not the same issue as bug 1994494.
Why this bug depends on bug 1994494?

...
> root@bootp-73-199-218 /home $ cat test.sh 
> for i in $(seq 100); do
>     printf "pass %03d ... " $i
>     dd if=/dev/zero bs=1M count=2048 of=$1 conv=fsync status=none
>     if [[ "$?" != "0" ]];then break;fi
>     echo "ok"
>   done
> 
> ./test.sh /dev/vda
> ./test.sh /dev/vdb
> ./test.sh /dev/sdb
> ./test.sh /dev/sdc

The fact that the same test script reproduce this bug does not mean that the
root cause is the same.

Comment 5 qing.wang 2021-09-01 10:19:19 UTC
(In reply to Nir Soffer from comment #4)
> (In reply to qing.wang from comment #1)
> ...
> > Sometimes it will crash on writing qcow2 disk
> > 
> > #0  0x00007f5a1d9b8763 in pthread_kill.5 () from /lib64/libc.so.6
> > [Current thread is 1 (Thread 0x7f5a19733640 (LWP 54119))]
> > (gdb) bt
> > #0  0x00007f5a1d9b8763 in pthread_kill.5 () from /lib64/libc.so.6
> > #1  0x00007f5a1d96b686 in raise () from /lib64/libc.so.6
> > #2  0x00007f5a1d9557d3 in abort () from /lib64/libc.so.6
> > #3  0x00007f5a1d9556fb in __assert_fail_base.cold () from /lib64/libc.so.6
> > #4  0x00007f5a1d9643a6 in __assert_fail () from /lib64/libc.so.6
> > #5  0x00005611e45a7634 in qemu_iovec_init_extended (qiov=<optimized out>, 
> >     head_buf=<optimized out>, head_len=<optimized out>, mid_qiov=<optimized
> 
> This is assert() failure in qemu, it is not the same issue as bug 1994494.
> Why this bug depends on bug 1994494?
> 
> ...
> > root@bootp-73-199-218 /home $ cat test.sh 
> > for i in $(seq 100); do
> >     printf "pass %03d ... " $i
> >     dd if=/dev/zero bs=1M count=2048 of=$1 conv=fsync status=none
> >     if [[ "$?" != "0" ]];then break;fi
> >     echo "ok"
> >   done
> > 
> > ./test.sh /dev/vda
> > ./test.sh /dev/vdb
> > ./test.sh /dev/sdb
> > ./test.sh /dev/sdc
> 
> The fact that the same test script reproduce this bug does not mean that the
> root cause is the same.

1994494 comment 0 just report the format raw, and the qcow2 issue is reported https://bugzilla.redhat.com/show_bug.cgi?id=1994494#c66.
Discuss with kevin, they should be same reason and  have verified in 1994494.

Comment 8 Kevin Wolf 2021-11-08 16:14:33 UTC
We'll get the fix with the 6.2 rebase.

Comment 11 Yanan Fu 2021-12-20 12:44:47 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 12 qing.wang 2021-12-21 05:59:52 UTC
Passed test on 
Red Hat Enterprise Linux release 9.0 Beta (Plow)
5.14.0-30.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64
seabios-bin-1.14.0-7.el9.noarch
edk2-ovmf-20210527gite1999b264f1f-7.el9.noarch

Test steps

Test steps:

1.build iscis target server
root@dell-per440-07 /home/tmp $ targetcli ls
o- / ........................................................................................... [...]
  o- backstores ................................................................................ [...]
  | o- block .................................................................... [Storage Objects: 0]
  | o- fileio ................................................................... [Storage Objects: 1]
  | | o- one ................................... [/home/iscsi/onex.img (20.0GiB) write-back activated]
  | |   o- alua ..................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ......................................... [ALUA state: Active/optimized]
  | o- pscsi .................................................................... [Storage Objects: 0]
  | o- ramdisk .................................................................. [Storage Objects: 0]
  o- iscsi .............................................................................. [Targets: 1]
  | o- iqn.2016-06.one.server:one-a ........................................................ [TPGs: 1]
  |   o- tpg1 ................................................................. [no-gen-acls, no-auth]
  |     o- acls ............................................................................ [ACLs: 2]
  |     | o- iqn.1994-05.com.redhat:clienta ......................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 .................................................... [lun0 fileio/one (rw)]
  |     | o- iqn.1994-05.com.redhat:clientb ......................................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 .................................................... [lun0 fileio/one (rw)]
  |     o- luns ............................................................................ [LUNs: 1]
  |     | o- lun0 ............................. [fileio/one (/home/iscsi/onex.img) (default_tg_pt_gp)]
  |     o- portals ...................................................................... [Portals: 1]
  |       o- 0.0.0.0:3260 ....................................................................... [OK]
  o- loopback ........................................................................... [Targets: 0]

2.attach iscsi disk on host
iscsiadm -m discovery -t st -p 127.0.0.1
iscsiadm -m node -T iqn.2016-06.one.server:one-a  -p 127.0.0.1:3260 -l

3.change the value to 64 on the attached disk

echo 64 > /sys/block/sdd/queue/max_sectors_kb

4.create lvms on the disk
pvcreate /dev/sdd
vgcreate vg /dev/sdd

lvcreate -L 3G -n lv1 vg;lvcreate -L 3G -n lv2 vg
lvcreate -L 3G -n lv3 vg;lvcreate -L 3G -n lv4 vg


qemu-img create -f qcow2 /dev/vg/lv1 3G;qemu-img create -f qcow2 /dev/vg/lv2 3G

qemu-img create -f qcow2 /dev/vg/lv3 3G;qemu-img create -f qcow2 /dev/vg/lv4 3G
 
5.boot the vm with two lvm as blk device



/usr/libexec/qemu-kvm \
  -name src_vm1 \
  -machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off \
  -m 8g \
  \
  -device pcie-root-port,id=pcie.0-root-port-2,slot=2,bus=pcie.0,multifunction=on \
  -device pcie-root-port,id=pcie.0-root-port-3,slot=3,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-4,slot=4,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-5,slot=5,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-6,slot=6,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-7,slot=7,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-8,slot=8,bus=pcie.0 \
  -device pcie-root-port,id=pcie.0-root-port-9,slot=9,bus=pcie.0 \
  -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -object iothread,id=iothread0 \
  -object iothread,id=iothread1 \
  -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0,iothread=iothread0 \
  -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \
  -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=drive_image1 \
  -device scsi-hd,id=os1,drive=drive_image1,bootindex=0 \
  \
   -blockdev node-name=host_device_stg,driver=host_device,aio=native,filename=/dev/vg/lv1,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg \
  -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-4,addr=0x0,write-cache=on,id=stg,drive=drive_stg,rerror=stop,werror=stop \
\
 -blockdev node-name=host_device_stg2,driver=host_device,aio=native,filename=/dev/vg/lv2,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg2 \
  -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-5,addr=0x0,write-cache=on,id=stg2,drive=drive_stg2,rerror=stop,werror=stop \
\
 -blockdev node-name=host_device_stg3,driver=host_device,aio=native,filename=/dev/vg/lv3,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg3,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg3 \
  -device scsi-hd,write-cache=on,id=stg3,drive=drive_stg3,rerror=stop,werror=stop \
\
 -blockdev node-name=host_device_stg4,driver=host_device,aio=native,filename=/dev/vg/lv4,cache.direct=on,cache.no-flush=off,discard=unmap \
  -blockdev node-name=drive_stg4,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg4 \
  -device scsi-hd,write-cache=on,id=stg4,drive=drive_stg4,rerror=stop,werror=stop \
\
  \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -device virtio-net-pci,mac=9a:b5:b6:b1:b4:b5,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-6,addr=0x0 \
  -netdev tap,id=idxgXAlm


6.execute io on disks in guest

root@bootp-73-199-218 /home $ cat test.sh 
for i in $(seq 100); do
    dd if=/dev/zero bs=1M count=2048 of=$1 conv=fsync status=none
done

./test.sh /dev/vda
./test.sh /dev/vdb
./test.sh /dev/sdb
./test.sh /dev/sdc

Comment 15 qing.wang 2021-12-23 06:02:00 UTC
Passed test refer to
https://bugzilla.redhat.com/show_bug.cgi?id=1997934#c12

Comment 18 errata-xmlrpc 2022-05-17 12:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307