1994041 – qemu-kvm scsi: change default passthrough timeout to non-infinite

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1994041 - qemu-kvm scsi: change default passthrough timeout to non-infinite

Summary: qemu-kvm scsi: change default passthrough timeout to non-infinite

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.4
Hardware:	x86_64
OS:	All
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	qing.wang
QA Contact:	qing.wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2001587 2003071 2004334
TreeView+	depends on / blocked

Reported:	2021-08-16 14:51 UTC by Frank DeLorey
Modified:	2024-12-20 20:43 UTC (History)
CC List:	18 users (show)
Fixed In Version:	qemu-kvm-4.2.0-59.module+el8.5.0+12817+cb650d43
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2001587 2003071 (view as bug list)
Environment:
Last Closed:	2021-11-09 18:02:58 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gitlab	redhat/rhel/src/qemu-kvm qemu-kvm merge_requests 40	None	None	None	2021-09-20 18:46:26 UTC
Red Hat Issue Tracker	RHELPLAN-93712	None	None	None	2021-08-16 14:57:05 UTC
Red Hat Product Errata	RHSA-2021:4191	None	None	None	2021-11-09 18:03:26 UTC

Description Frank DeLorey 2021-08-16 14:51:52 UTC

Description of problem:

VM in customer RHEL 7.9 server that would not stop or restart from virt-manager.  They then tried to force it off but it failed.

Version-Release number of selected component (if applicable):

RHEL 7.9 
qemu-kvm-1.5.3-175.el7_9.3.x86_64                        

How reproducible:

Unknown

Steps to Reproduce:
1. Customer hit a FCoE problem
2. Attempted to shutdown the VM
3. VM hung and became defunct

Actual results:

Cannot kill the VM so it will not restrat until the host is restarted.

Expected results:

VM should have died and restarted.

Additional info:

The problem with the zombie qemu not being reaped is that it is part of a multi-threaded group and the other threads are all stuck in a UN state waiting on IO completion that is never going to happen because they are doing SG_IO TURs with an infinite timeout.  If you wish a copy of my analysis notes for this I can provide them upon request.  The msgbuf is full of qedf FCoE aborts so clearly there has been some issues. To remedy this requires two things:

1. customer to sort out their FCoE problems assuming we don't uncover any further problems with the CNA/driver/fw in use

2. RH to backport a qemu change or some variant of it.


For (2), we can see the customer is using:

$ grep qemu installed-rpms 
ipxe-roms-qemu-20180825-3.git133f4c.el7.noarch              Wed Mar  3 10:46:41 2021
libvirt-daemon-driver-qemu-4.5.0-36.el7_9.3.x86_64          Wed Mar  3 16:24:31 2021
qemu-guest-agent-2.12.0-3.el7.x86_64                        Wed Mar  3 10:50:31 2021
qemu-img-1.5.3-175.el7_9.3.x86_64                           Wed Mar  3 16:24:17 2021
qemu-kvm-1.5.3-175.el7_9.3.x86_64                           Wed Mar  3 16:24:34 2021
qemu-kvm-common-1.5.3-175.el7_9.3.x86_64                    Wed Mar  3 16:24:18 2021


"hw/scsi/scsi-generic.c" 520L, 14606C


    160 static int execute_command(BlockDriverState *bdrv,
    161                            SCSIGenericReq *r, int direction,
    162                            BlockDriverCompletionFunc *complete)
    163 {
    164     r->io_header.interface_id = 'S';
    165     r->io_header.dxfer_direction = direction;
    166     r->io_header.dxferp = r->buf;
    167     r->io_header.dxfer_len = r->buflen;
    168     r->io_header.cmdp = r->req.cmd.buf;
    169     r->io_header.cmd_len = r->req.cmd.len;
    170     r->io_header.mx_sb_len = sizeof(r->req.sense);
    171     r->io_header.sbp = r->req.sense;
    172     r->io_header.timeout = MAX_UINT;         <<<<<<-------
    173     r->io_header.usr_ptr = r;
    174     r->io_header.flags |= SG_FLAG_DIRECT_IO;
    175 
    176     r->req.aiocb = bdrv_aio_ioctl(bdrv, SG_IO, &r->io_header, complete, r);
    177     if (r->req.aiocb == NULL) {
    178         return -EIO;
    179     }
    180 
    181     return 0;
    182 }


Unfortunately we have no trace or user-space stack-trace of the problem IO threads but I'm assuming they came through here.


However looking upstream this is different:

https://gitlab.com/qemu-project/qemu/-/blob/master/hw/scsi/scsi-generic.c

     
static int execute_command(BlockBackend *blk,
                           SCSIGenericReq *r, int direction,
                           BlockCompletionFunc *complete)
{
     SCSIDevice *s = r->req.dev;
     r->io_header.interface_id = 'S';
     r->io_header.dxfer_direction = direction;
     r->io_header.dxferp = r->buf;
     r->io_header.dxfer_len = r->buflen;
     r->io_header.cmdp = r->req.cmd.buf;
     r->io_header.cmd_len = r->req.cmd.len;
     r->io_header.mx_sb_len = sizeof(r->req.sense);
     r->io_header.sbp = r->req.sense;
     r->io_header.timeout = s->io_timeout * 1000;    <<<<----
     r->io_header.usr_ptr = r;
     r->io_header.flags |= SG_FLAG_DIRECT_IO;

     trace_scsi_generic_aio_sgio_command(r->req.tag, r->req.cmd.buf[0],                                                              
                                         r->io_header.timeout);
     r->req.aiocb = blk_aio_ioctl(blk, SG_IO, &r->io_header, complete, r);
.........


A review of the commits and we see the patch that addresses this, even down to explaining the problem with infinite stalls:

https://gitlab.com/qemu-project/qemu/-/commit/c9b6609b69facad0cc5425d4fa7934c33d7f2e91


I think we need a patch for qemu, you probably cannot take  the patch as-is because there are so many changes, instead in RHEL7 you could just make it 30 or 60 seconds which is the typical defaults for most IOs instead of MAX_UINT (effectively infinite timeout - very bad).

So please look to provide a qemu patch for the above in the RHEL7 z-stream

I also see this was fixed upstream here:

https://patchew.org/QEMU/20201116183114.55703-1-hare@suse.de/20201116183114.55703-3-hare@suse.de/

Comment 7 qing.wang 2021-08-20 09:33:32 UTC

I can not totally reproduce this issue, but i hit similar issue.
I am not sure they are same reason, it will postpone about 2 minutes then the vm be killed.

Environment prepare:
Iscsi target server 

1.build iscsi server 

root@qing /home/vbugs $ targetcli ls
o- / ................................................................................................ [...]
  o- backstores ..................................................................................... [...]
  | o- block ......................................................................... [Storage Objects: 0]
  | o- fileio ........................................................................ [Storage Objects: 1]
  | | o- one ........................................ [/home/iscsi/onex.img (30.0GiB) write-back activated]
  | |   o- alua .......................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp .............................................. [ALUA state: Active/optimized]
  | o- pscsi ......................................................................... [Storage Objects: 0]
  | o- ramdisk ....................................................................... [Storage Objects: 0]
  o- iscsi ................................................................................... [Targets: 1]
  | o- iqn.2016-06.one.server:one-a ............................................................. [TPGs: 1]
  |   o- tpg1 ...................................................................... [no-gen-acls, no-auth]
  |     o- acls ................................................................................. [ACLs: 2]
  |     | o- iqn.1994-05.com.redhat:clienta .............................................. [Mapped LUNs: 1]
  |     | | o- mapped_lun0 ......................................................... [lun0 fileio/one (rw)]
  |     | o- iqn.1994-05.com.redhat:clientb .............................................. [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ......................................................... [lun0 fileio/one (rw)]
  |     o- luns ................................................................................. [LUNs: 1]
  |     | o- lun0 .................................. [fileio/one (/home/iscsi/onex.img) (default_tg_pt_gp)]
  |     o- portals ........................................................................... [Portals: 1]
  |       o- 0.0.0.0:3260 ............................................................................ [OK]
  o- loopback ................................................................................ [Targets: 0]

2. connect the iscsi disk on host

iscsiadm -m discovery -t st -p qing
iscsiadm -m node -T iqn.2016-06.one.server:one-a  -p qing:3260 -l

root@dell-per440-07 ~ $ lsblk
...
sdd                              8:48   0   30G  0 disk 

3. boot vm with the lun (/dev/sdd)
/usr/libexec/qemu-kvm \
  -name testvm \
  -machine pc \
  -m 8G \
  -smp 8 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
  -drive file=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none \
  -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
  \
  -drive file=/dev/sdd,format=raw,if=none,id=drive-scsi0-0-0-0,cache=none \
  -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -netdev \
  tap,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:dd:0a:82,bus=pci.0

4.login the guest and execute io on the disk
sg_dd if=/dev/zero of=/dev/sda bs=4k count=7000000


5.stop the iscsi target server
systemctl stop target

6.find the qemu process and kill it
root@ibm-x3850x5-08 /home/vbugs $ pgrep qemu-kvm
5222
root@ibm-x3850x5-08 /home/vbugs $ kill -9 5222

the process step in Sl+  status not still exist
(It will be truly killed about after 2 minutes)

root@dell-per440-07 ~ $ ps  87602
    PID TTY      STAT   TIME COMMAND
  5222 pts/0    Sl+    1:27 /usr/libexec/qemu-kvm

if i skip step5 ,this process will be killed quickly.
if i recover target service
systemctl start target,this process also be killed quickly.

Comment 8 qing.wang 2021-08-20 09:34:29 UTC

(In reply to qing.wang from comment #7)
> I can not totally reproduce this issue, but i hit similar issue.
> I am not sure they are same reason, it will postpone about 2 minutes then
> the vm be killed.
> 
> Environment prepare:
> Iscsi target server 
> 
> 1.build iscsi server 
> 
> root@qing /home/vbugs $ targetcli ls
> o- /
> .............................................................................
> ................... [...]
>   o- backstores
> .............................................................................
> ........ [...]
>   | o- block
> .........................................................................
> [Storage Objects: 0]
>   | o- fileio
> ........................................................................
> [Storage Objects: 1]
>   | | o- one ........................................ [/home/iscsi/onex.img
> (30.0GiB) write-back activated]
>   | |   o- alua
> ..........................................................................
> [ALUA Groups: 1]
>   | |     o- default_tg_pt_gp ..............................................
> [ALUA state: Active/optimized]
>   | o- pscsi
> .........................................................................
> [Storage Objects: 0]
>   | o- ramdisk
> .......................................................................
> [Storage Objects: 0]
>   o- iscsi
> .............................................................................
> ...... [Targets: 1]
>   | o- iqn.2016-06.one.server:one-a
> ............................................................. [TPGs: 1]
>   |   o- tpg1
> ......................................................................
> [no-gen-acls, no-auth]
>   |     o- acls
> .............................................................................
> .... [ACLs: 2]
>   |     | o- iqn.1994-05.com.redhat:clienta
> .............................................. [Mapped LUNs: 1]
>   |     | | o- mapped_lun0
> ......................................................... [lun0 fileio/one
> (rw)]
>   |     | o- iqn.1994-05.com.redhat:clientb
> .............................................. [Mapped LUNs: 1]
>   |     |   o- mapped_lun0
> ......................................................... [lun0 fileio/one
> (rw)]
>   |     o- luns
> .............................................................................
> .... [LUNs: 1]
>   |     | o- lun0 .................................. [fileio/one
> (/home/iscsi/onex.img) (default_tg_pt_gp)]
>   |     o- portals
> ...........................................................................
> [Portals: 1]
>   |       o- 0.0.0.0:3260
> ............................................................................
> [OK]
>   o- loopback
> .............................................................................
> ... [Targets: 0]
> 
> 2. connect the iscsi disk on host
> 
> iscsiadm -m discovery -t st -p qing
> iscsiadm -m node -T iqn.2016-06.one.server:one-a  -p qing:3260 -l
> 
> root@dell-per440-07 ~ $ lsblk
> ...
> sdd                              8:48   0   30G  0 disk 
> 
> 3. boot vm with the lun (/dev/sdd)
> /usr/libexec/qemu-kvm \
>   -name testvm \
>   -machine pc \
>   -m 8G \
>   -smp 8 \
>   -cpu host,+kvm_pv_unhalt \
>   -device ich9-usb-ehci1,id=usb1 \
>   -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
>   -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa \
>   -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
>   -drive
> file=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,
> format=qcow2,if=none,id=drive-virtio-disk0,cache=none \
>   -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,
> id=virtio-disk0,bootindex=1 \
>   \
>   -drive file=/dev/sdd,format=raw,if=none,id=drive-scsi0-0-0-0,cache=none \
>   -device
> scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,
> id=scsi0-0-0-0,bootindex=2 \
>   -vnc :5 \
>   -qmp tcp:0:5955,server,nowait \
>   -monitor stdio \
>   -netdev \
>   tap,id=hostnet0,vhost=on \
>   -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:dd:0a:82,bus=pci.0
> 
> 4.login the guest and execute io on the disk
> sg_dd if=/dev/zero of=/dev/sda bs=4k count=7000000
> 
> 
> 5.stop the iscsi target server
> systemctl stop target
> 
> 6.find the qemu process and kill it
> root@ibm-x3850x5-08 /home/vbugs $ pgrep qemu-kvm
> 5222
> root@ibm-x3850x5-08 /home/vbugs $ kill -9 5222
> 
> the process step in Sl+  status not still exist
> (It will be truly killed about after 2 minutes)
> 
> root@dell-per440-07 ~ $ ps  87602
>     PID TTY      STAT   TIME COMMAND
>   5222 pts/0    Sl+    1:27 /usr/libexec/qemu-kvm
> 
> if i skip step5 ,this process will be killed quickly.
> if i recover target service
> systemctl start target,this process also be killed quickly.

I have same result on qemu-kvm-common-6.0.0-28.module+el8.5.0+12271+fffa967b.x86_64

Comment 21 Christian Horn 2021-09-13 01:22:38 UTC

This bz got cloned to rhel8.4.z stream with bz2001587
(per default permissions are not copied to z-stream clones, as technical
 discussion happens mostly in the main bz, so this one in our case)

Comment 23 qing.wang 2021-09-16 08:15:30 UTC

I also tried the fc backend and simulating faulty disk,but it can not reproduce this issue.
I wonder what operation result in io error and wait infinite.

1.emualte bad block on FC

dmsetup create test << EOF
0 160000 linear /dev/sdb 0
160000 5 error
160005 80000 linear /dev/sdb 40005
EOF
 
2.expose it wit target server
root@dell-per440-07 /home/vbugs/feature $ targetcli ls
o- / ...................................................................................... [...]
  o- backstores ........................................................................... [...]
  | o- block ............................................................... [Storage Objects: 1]
  | | o- disk0 ............................... [/dev/mapper/test (117.2MiB) write-thru activated]
  | |   o- alua ................................................................ [ALUA Groups: 1]
  | |     o- default_tg_pt_gp .................................... [ALUA state: Active/optimized]
  | o- fileio .............................................................. [Storage Objects: 0]
  | o- pscsi ............................................................... [Storage Objects: 0]
  | o- ramdisk ............................................................. [Storage Objects: 0]
  o- iscsi ......................................................................... [Targets: 1]
  | o- iqn.2016-06.one.server:block ................................................... [TPGs: 1]
  |   o- tpg1 ............................................................ [no-gen-acls, no-auth]
  |     o- acls ....................................................................... [ACLs: 1]
  |     | o- iqn.1994-05.com.redhat:clientb .................................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 .............................................. [lun0 block/disk0 (rw)]
  |     o- luns ....................................................................... [LUNs: 1]
  |     | o- lun0 ........................... [block/disk0 (/dev/mapper/test) (default_tg_pt_gp)]
  |     o- portals ................................................................. [Portals: 1]
  |       o- 0.0.0.0:3260 .................................................................. [OK]
  o- loopback ...................................................................... [Targets: 0]


3. boot vm with attached disk
/usr/libexec/qemu-kvm \
  -name testvm \
  -machine pc \
  -m 8G \
  -smp 8 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
  -drive file=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none \
  -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
  \
  -drive file=/dev/sdd,format=raw,if=none,id=drive-scsi0-0-0-0 \
  -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=disk1,bootindex=2 \
  \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -netdev \
  tap,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:dd:0a:82,bus=pci.0

4.execute io in guest

sg_dd if=/dev/zero of=/dev/sda bs=1M count=100 oflag=direct

the io operation will hit io error but not step in wait status.


So the question is how to let guest io operation blocked in wait ? then may verify the kill operation.

Comment 24 Christian Horn 2021-09-16 08:18:49 UTC

FYI for HPE, also iSCSI and nbd (network block devices) as backend were tried.
Maybe HPE has further ideas..

Comment 25 qing.wang 2021-09-16 08:28:10 UTC

(In reply to Christian Horn from comment #24)
> FYI for HPE, also iSCSI and nbd (network block devices) as backend were
> tried.
> Maybe HPE has further ideas..

Thanks, but i think the backend is not the points, 

If it is related to specific HW , we need to know what HW error make io step in wait. 


-drive file=/dev/sdd,format=raw,if=none,id=drive-scsi0-0-0-0 \
-device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=disk1,bootindex=2 \

Comment 26 Darren Lavender 2021-09-16 16:53:37 UTC

> --- Comment #24 from Christian Horn <chorn> ---
> FYI for HPE, also iSCSI and nbd (network block devices) as backend were tried.
> Maybe HPE has further ideas..
> 



I believe there are two key things to reproducing this (somewhat artificially): 


a. the initiator needs to be sending periodic SG_IO requests to the target via the kvm execute_command() function below, ideally a TUR as we see in the vmcore. I think the TURs that we see in the vmcore probably originate from the guest's multipathd.  I assume the TURs get re-written/modified by kvm to add the infinite timeout but honestly, I never looked in to seeing how this was originated or how we come through this kvm execute_command() function so I am somewhat guessing....


b. the target needs to not respond to some of these SG_IO TURs




Can you try and configure multipathd in the guest?  You will need to do some tracing to see what happens to the IO, does it get emitted from kvm with the infinite SG_IO timeout?  That is fundamental and key to the behaviour.  As mentioned, from the kvm side I don't know how we get to this function but I suspect this is where the IOs that get stuck originate:


"hw/scsi/scsi-generic.c" 520L, 14606C


    160 static int execute_command(BlockDriverState *bdrv,
    161                            SCSIGenericReq *r, int direction,
    162                            BlockDriverCompletionFunc *complete)
    163 {
    164     r->io_header.interface_id = 'S';
    165     r->io_header.dxfer_direction = direction;
    166     r->io_header.dxferp = r->buf;
    167     r->io_header.dxfer_len = r->buflen;
    168     r->io_header.cmdp = r->req.cmd.buf;
    169     r->io_header.cmd_len = r->req.cmd.len;
    170     r->io_header.mx_sb_len = sizeof(r->req.sense);
    171     r->io_header.sbp = r->req.sense;
    172     r->io_header.timeout = MAX_UINT;         <<<<<<-------
    173     r->io_header.usr_ptr = r;
    174     r->io_header.flags |= SG_FLAG_DIRECT_IO;
    175 
    176     r->req.aiocb = bdrv_aio_ioctl(bdrv, SG_IO, &r->io_header, complete, r);
    177     if (r->req.aiocb == NULL) {
    178         return -EIO;
    179     }
    180 
    181     return 0;
    182 }


You may need to set a gdb bp there to see whether a multipathd TUR uses this and has that TUR modified.  If you don't see this code exercised then you are not going to see the problem.



The target part is a bit more involved since you are going to have to find a way of having the target periodically accept the request but not respond (complete) the IO request for a TUR.  You don't have to ignore all of them, just every N-th request, they'll soon stack up on the initiator side.  Remember for whatever reasons storage sometimes behaves like this, f/w bugs, busy-levels, who knows what, it just doesn't respond.... so you need to be able to simulate such behaviour.

At some point the queue on the initiator will get blocked. The IOs never get timed out by the initiator because they have infinite timeouts so no recovery happens. The kvm process will not shutdown/die because some threads will go UN state.  That is what we observed in the vmcores...

Comment 28 Germano Veit Michel 2021-09-17 03:24:52 UTC

What about using a dm-delay block device?
https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/delay.html

If the delay is not enough or cannot be high enough to test this, we can also suspend/resume the device:

dmsetup suspend <dev>
dmsetup resume <dev>

Comment 29 Germano Veit Michel 2021-09-17 03:38:17 UTC

(In reply to Germano Veit Michel from comment #28)
> What about using a dm-delay block device?
> https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/delay.html
> 
> If the delay is not enough or cannot be high enough to test this, we can
> also suspend/resume the device:
> 
> dmsetup suspend <dev>
> dmsetup resume <dev>

Hmm, to be clearer, to put an iscsi on top of, like on comment #7.
That was using an LV, here we could delay and even stop the writes
to the actual device. When if we use no caches it may reproduce...

Comment 30 qing.wang 2021-09-17 11:06:27 UTC

I emulated low respond scenario about 10 minutes, It looks like reproduce this issue.

Red Hat Enterprise Linux Server release 7.9 (Maipo)
3.10.0-1160.el7.x86_64
qemu-kvm-1.5.3-175.el7_9.3.x86_64


1.create scsi_debug disk 128
modprobe scsi_debug  dev_size_mb=128

2.create mapper device with 10 minutes delay on the disk
dmsetup create test2 << EOF
0 160000 linear /dev/sdb 0
160000 5 delay /dev/sdb 0 0 /dev/sdd 0 600000
160005 80000 linear /dev/sdb 40005
EOF

3.expose mapper device with iscsi target
o- / ...................................................................................... [...]
  o- backstores ........................................................................... [...]
  | o- block ............................................................... [Storage Objects: 1]
  | | o- disk0 .............................. [/dev/mapper/test2 (117.2MiB) write-thru activated]
  | |   o- alua ................................................................ [ALUA Groups: 1]
  | |     o- default_tg_pt_gp .................................... [ALUA state: Active/optimized]
  | o- fileio .............................................................. [Storage Objects: 0]
  | o- pscsi ............................................................... [Storage Objects: 0]
  | o- ramdisk ............................................................. [Storage Objects: 0]
  o- iscsi ......................................................................... [Targets: 1]
  | o- iqn.2016-06.one.server:block ................................................... [TPGs: 1]
  |   o- tpg1 ............................................................ [no-gen-acls, no-auth]
  |     o- acls ....................................................................... [ACLs: 2]
  |     | o- iqn.1994-05.com.redhat:clienta .................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 .............................................. [lun0 block/disk0 (rw)]
  |     | o- iqn.1994-05.com.redhat:clientb .................................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 .............................................. [lun0 block/disk0 (rw)]
  |     o- luns ....................................................................... [LUNs: 1]
  |     | o- lun0 .......................... [block/disk0 (/dev/mapper/test2) (default_tg_pt_gp)]
  |     o- portals ................................................................. [Portals: 1]
  |       o- 0.0.0.0:3260 .................................................................. [OK]
  o- loopback ...................................................................... [Targets: 0]

4. attach disk on [other] host

sdd -> iscsi

5.boot vm with passthrough disk

/usr/libexec/qemu-kvm \
  -name testvm \
  -machine pc \
  -m 8G \
  -smp 8 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
  -drive file=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none \
  -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
  \
  -drive file=/dev/sdd,format=raw,if=none,id=drive-scsi0-0-0-0 \
  -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=disk1,bootindex=2 \
  \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -netdev \
  tap,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:dd:0a:82,bus=pci.0


/usr/libexec/qemu-kvm \
  -name testvm \
  -machine pc \
  -m 8G \
  -smp 8 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
  -blockdev driver=qcow2,file.driver=file,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=os_image1   \
  -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=os_image1,id=virtio-disk0,bootindex=1 \
  \
  -blockdev driver=raw,file.driver=host_device,file.filename=/dev/sdd,node-name=data1   \
  -device scsi-block,bus=scsi0.0,drive=data1,id=disk1,bootindex=2 \
  \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -netdev \
  tap,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:dd:0a:82,bus=pci.0

6. execute io on guest and step in low speed write status

dev=sda
if [ "x$1" != "x" ];then
dev=$1
fi
echo "$dev"
while true;do
sg_dd if=/dev/zero of=/dev/$dev bs=1M count=100 oflag=direct
echo "do dd"
done

7.find the qemu process and kill it
pid=`pgrep qemu-kvm`;echo $pid;  kill -9 $pid
time while true;do if ps $pid; then sleep 10;echo "active";else echo "exit";break;fi done


8.the real kill time is related to the delay time at step 2



But it cost 3 minutes on following version
Red Hat Enterprise Linux release 8.5 Beta (Ootpa)
4.18.0-339.el8.x86_64
qemu-kvm-6.0.0-30.module+el8.5.0+12586+476da3e1.x86_64



I check the code of 12586 , it have apply the fix in scsi-generic.c.
If possible who may help to confirm it. thanks.

Comment 31 Paolo Bonzini 2021-09-20 11:42:55 UTC

> But it cost 3 minutes on following version
> Red Hat Enterprise Linux release 8.5 Beta (Ootpa)
> 4.18.0-339.el8.x86_64
> qemu-kvm-6.0.0-30.module+el8.5.0+12586+476da3e1.x86_64

Hi qing.wang, the bug is for RHEL 8, not AV; so you need to test with the 4.2 versions of QEMU.

Comment 34 qing.wang 2021-09-22 07:00:24 UTC

reproduce this with steps as # comment 30 on 
Red Hat Enterprise Linux release 8.5 Beta (Ootpa)
4.18.0-339.el8.x86_64
qemu-kvm-4.2.0-58.module+el8.5.0+12272+74ace547.x86_64
seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch

Comment 38 Yanan Fu 2021-10-08 01:18:22 UTC

QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 39 qing.wang 2021-10-08 06:28:24 UTC

Passed test on

Red Hat Enterprise Linux release 8.5 Beta (Ootpa)
4.18.0-348.el8.x86_64
qemu-kvm-4.2.0-59.module+el8.5.0+12817+cb650d43.x86_64
seabios-bin-1.13.0-2.module+el8.3.0+7353+9de0a3cc.noarch

The kill time is about 3m30s

1.create scsi_debug disk 128
modprobe scsi_debug  dev_size_mb=128

2.create mapper device with 5 minutes delay on the disk
disk=/dev/sdb
dmsetup create test << EOF
0 160000 linear ${disk} 0
160000 5 delay ${disk} 160000 0 ${disk} 160000 300000
160005 80000 linear ${disk} 160005
EOF

3.expose mapper device with iscsi target
o- / ..................................................................... [...]
  o- backstores .......................................................... [...]
  | o- block .............................................. [Storage Objects: 1]
  | | o- disk0 .............. [/dev/mapper/test (117.2MiB) write-thru activated]
  | |   o- alua ............................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ................... [ALUA state: Active/optimized]
  | o- fileio ............................................. [Storage Objects: 0]
  | o- pscsi .............................................. [Storage Objects: 0]
  | o- ramdisk ............................................ [Storage Objects: 0]
  o- iscsi ........................................................ [Targets: 1]
  | o- iqn.2016-06.one.server:block .................................. [TPGs: 1]
  |   o- tpg1 ........................................... [no-gen-acls, no-auth]
  |     o- acls ...................................................... [ACLs: 1]
  |     | o- iqn.1994-05.com.redhat:clientb ................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ............................. [lun0 block/disk0 (rw)]
  |     o- luns ...................................................... [LUNs: 1]
  |     | o- lun0 .......... [block/disk0 (/dev/mapper/test) (default_tg_pt_gp)]
  |     o- portals ................................................ [Portals: 1]
  |       o- 0.0.0.0:3260 ................................................. [OK]
  o- loopback ..................................................... [Targets: 0

4. attach disk on [other] host

sdc -> iscsi

5.boot vm with passthrough disk

/usr/libexec/qemu-kvm \
  -name testvm \
  -machine pc \
  -m 8G \
  -smp 8 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0xa \
  -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
  -blockdev driver=qcow2,file.driver=file,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=os_image1   \
  -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=os_image1,id=virtio-disk0,bootindex=1 \
  \
  -blockdev driver=raw,file.driver=host_device,file.filename=/dev/sdc,node-name=data1   \
  -device scsi-block,bus=scsi0.0,drive=data1,id=disk1,bootindex=2 \
  \
  -vnc :5 \
  -qmp tcp:0:5955,server,nowait \
  -monitor stdio \
  -netdev \
  tap,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:dd:0a:82,bus=pci.0


6. execute io on guest and step in low speed write status

dev=sda
if [ "x$1" != "x" ];then
dev=$1
fi
echo "$dev"
while true;do
sg_dd if=/dev/zero of=/dev/$dev bs=1M count=100 oflag=direct
echo "do dd"
done

7.find the qemu process and kill it
pid=`pgrep qemu-kvm`;echo $pid;  kill -9 $pid
time while true;do if ps $pid; then sleep 10;echo "active";else echo "exit";break;fi done

The kill time is about 3m30s

Comment 41 errata-xmlrpc 2021-11-09 18:02:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4191

Note You need to log in before you can comment on or make changes to this bug.