Bug 1711155 - Vm gets stuck when iscsi disk is not accessible
Summary: Vm gets stuck when iscsi disk is not accessible
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ademar Reis
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-17 06:29 UTC by Han Han
Modified: 2019-11-06 07:16 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-06 07:15:20 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:16:14 UTC

Description Han Han 2019-05-17 06:29:48 UTC
Description of problem:
As subject

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-28.el7.x86_64
libiscsi-1.9.0-7.el7.x86_64
libvirt-4.5.0-17.virtcov.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a iscsi server via targetcli
2. Start a vm with iscsi disk as the 2nd disk, set rw error policies to ignore
Disk xml:
<disk type="file" device="disk">
      <driver name="qemu" type="qcow2" />
      <source file="/var/lib/libvirt/images/pc.qcow2" />
      <backingStore />
      <target dev="sda" bus="scsi" />
      <alias name="scsi0-0-0-0" />
      <address type="drive" controller="0" bus="0" target="0" unit="0" />
    </disk>-- NODE --
<disk type="network" device="disk">
      <driver name="qemu" type="raw" error_policy="ignore" rerror_policy="ignore" />
      <source protocol="iscsi" name="iqn.2019-04.org.linux-iscsi-noauth/0">
        <host name="localhost4" port="3260" />
      </source>
      <target dev="sdb" bus="scsi" />
      <alias name="scsi0-0-0-1" />
      <address type="drive" controller="0" bus="0" target="0" unit="1" />
    </disk>

Qemu cmdline:
/usr/libexec/qemu-kvm -name guest=pc,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-pc/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid bd0c6823-3472-40a5-a17c-7e550889bce2 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-ehci1,id=usb1,bus=pci.0,addr=0x3.0x7 -device qemu-xhci,id=usb2,bus=pci.0,addr=0x9 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -device ahci,id=sata0,bus=pci.0,addr=0xe -device ahci,id=sata1,bus=pci.0,addr=0xc -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/pc.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file.driver=iscsi,file.portal=localhost4:3260,file.target=iqn.2019-04.org.linux-iscsi-noauth,file.lun=0,file.transport=tcp,format=raw,if=none,id=drive-scsi0-0-0-1,werror=ignore,rerror=ignore -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f8:14:ca,bus=pci.0,addr=0xa -netdev tap,fd=30,id=hostnet1,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:86:2f:66,bus=pci.0,addr=0xb -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=32,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

3. Destory iscsi service. Then the vm will hang but the status is running.
Morever, no resonse from qmp:
3.1 Stop libvirtd service
3.2 Try to connect qmp monitor
# nc -U /var/lib/libvirt/qemu/domain-11-pc/monitor.sock
{ "execute": "qmp_capabilities" }

no return.

Actual results:
As above

Expected results:
Vm doesn't get stuck.

Additional info:
From the backtrace, I find the main process of qemu-kvm was hang at iscsi_reconnect:
(gdb) bt
 #0  0x00007fa74010e91e in iscsi_reconnect (old_iscsi=old_iscsi@entry=0x559a20244000) at lib/connect.c:300
#1  0x00007fa74011a93c in iscsi_service (iscsi=<optimized out>) at lib/socket.c:684
#2  0x00007fa74011a93c in iscsi_service (iscsi=iscsi@entry=0x559a20244000, revents=revents@entry=1) at lib/socket.c:774
#3  0x0000559a1e6d8bcf in iscsi_process_read (arg=0x559a200210e0) at block/iscsi.c:373
#4  0x0000559a1e744f68 in aio_dispatch_handlers (ctx=ctx@entry=0x559a1ff737c0) at util/aio-posix.c:410
#5  0x0000559a1e7457f8 in aio_dispatch (ctx=0x559a1ff737c0) at util/aio-posix.c:441
#6  0x0000559a1e74261e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
#7  0x00007fa741930049 in g_main_context_dispatch (context=0x559a1ff8cc60) at gmain.c:3175
#8  0x00007fa741930049 in g_main_context_dispatch (context=context@entry=0x559a1ff8cc60) at gmain.c:3828
#9  0x0000559a1e744ae7 in main_loop_wait () at util/main-loop.c:215
#10 0x0000559a1e744ae7 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
#11 0x0000559a1e744ae7 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
#12 0x0000559a1e3e5107 in main () at vl.c:1964
#13 0x0000559a1e3e5107 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4782
 
in the function iscsi_reconnect, libiscsi will try to connect iscsi server again and agagin:
245 try_again:
246
247         iscsi = iscsi_create_context(old_iscsi->initiator_name);
248         if (!iscsi) {
249                 ISCSI_LOG(old_iscsi, 2, "failed to create new context for reconnection");
250                 return -1;
......
296                 ISCSI_LOG(old_iscsi, 1, "reconnect try %d failed, waiting %d seconds", retry, backoff);
297                 iscsi_destroy_context(iscsi);
298                 sleep(backoff);
299                 retry++;
300                 goto try_again;

It seems that libiscsi reconnect function should work as aync way or qemu shouldn't invoke that function in main process...

BTW, all works well on 
libvirt-5.3.0-1.module+el8.1.0+3164+94495c71.x86_64
qemu-kvm-4.0.0-1.module+el8.1.0+3216+7947b8cc.x86_64
libiscsi-1.18.0-6.module+el8+2603+0a5231c4.x86_64

Comment 3 Ademar Reis 2019-05-31 23:38:11 UTC
wontfix in RHEL7 due to capacity. Moving to RHEL8-AV (where this seems to be fixed already, will update it later)

Comment 4 Ademar Reis 2019-05-31 23:39:26 UTC
(In reply to Han Han from comment #0)
> BTW, all works well on 
> libvirt-5.3.0-1.module+el8.1.0+3164+94495c71.x86_64
> qemu-kvm-4.0.0-1.module+el8.1.0+3216+7947b8cc.x86_64
> libiscsi-1.18.0-6.module+el8+2603+0a5231c4.x86_64

Marking the BZ POST because the fixes are in QEMU-4.0.

Comment 6 Danilo de Paula 2019-06-04 23:26:45 UTC
We need QA_ACK+ for this.

Comment 9 Xueqiang Wei 2019-06-18 10:16:00 UTC
Reproduced it on rhel7.7 with below steps:

Versions:
Host:
kernel-3.10.0-1053.el7.x86_64
qemu-kvm-rhev-2.12.0-32.el7
libiscsi-1.9.0-7.el7.x86_64


1. Prepare a iscsi server via targetcli
2. Start a vm with iscsi disk, set rw error policies to ignore
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device pcie-root-port,id=pcie_root_port_1,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device pcie-root-port,id=pcie_root_port_2,slot=4,chassis=4,addr=0x4,bus=pcie.0  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_yvw268de/monitor-qmpmonitor1-20181017-004217-U4Tik3JV,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_yvw268de/monitor-catch_monitor-20181017-004217-U4Tik3JV,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idaVJ26s  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_yvw268de/serial-serial0-20181017-004217-U4Tik3JV,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20181017-004217-U4Tik3JV,path=/var/tmp/avocado_yvw268de/seabios-20181017-004217-U4Tik3JV,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20181017-004217-U4Tik3JV,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-5,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -object iothread,id=iothread0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-6,addr=0x0,iothread=iothread0 \
    -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,node-name=my_file \
    -blockdev driver=qcow2,node-name=my,file=my_file \
    -device scsi-hd,drive=my,id=image1,bootindex=0 \
     -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
    -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \
    -blockdev driver=qcow2,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/data.qcow2,node-name=drive2,file.driver=file \
    -device scsi-hd,drive=drive2,id=data-disk1,bus=scsi1.0 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:82:83:84:85:86,id=idWBc2X6,vectors=4,netdev=idX17Mug,bus=pcie.0-root-port-7,addr=0x0  \
    -netdev tap,id=idX17Mug,vhost=on \
    -m 8G  \
    -smp 12,maxcpus=12,cores=6,threads=1,sockets=2  \
    -cpu 'Opteron_G5',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off,strict=off  \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:4444,server,nowait \
    -blockdev driver=raw,file.driver=iscsi,file.transport=tcp,file.portal=10.66.10.36,file.initiator-name=iqn.1994-05.com.redhat:d399855229c,file.target=iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3,file.lun=0,cache.direct=off,cache.no-flush=on,node-name=drive3 \
    -device scsi-block,drive=drive3,id=data-disk2,bus=scsi1.0,werror=ignore,rerror=ignore \

3. disable access iscsi server

# iptables -A OUTPUT -s 10.73.196.59  -p tcp --dport 3260 -j REJECT


After step 3, the vm will hang. Moreover, no response from qmp.


(qemu) qemu-kvm: iSCSI: NOP timeout. Reconnecting...

# telnet localhost 4444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

{"execute": "qmp_capabilities"}

no return.




Retested on rhel8.1.0, not hit this issue, So set status to VERIFIED.

Versions:
Host:
kernel-4.18.0-85.el8.x86_64
qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71
libiscsi-1.18.0-6.module+el8.1.0+3258+4c45705b.x86_64

Guest:
kernel-4.18.0-85.el8.x86_64


After step 3, guest doesn't hang and it has response from qmp monitor.



(qemu) info block
my: /home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 (qcow2)
    Attached to:      image1
    Cache mode:       writeback

drive2: /home/kvm_autotest_root/images/data.qcow2 (qcow2)
    Attached to:      data-disk1
    Cache mode:       writeback, ignore flushes

drive3: json:{"driver": "raw", "file": {"lun": 0, "portal": "10.66.10.36", "initiator-name": "iqn.1994-05.com.redhat:d399855229c", "driver": "iscsi", "transport": "tcp", "target": "iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3"}} (raw)
    Attached to:      data-disk2
    Cache mode:       writeback, ignore flushes
(qemu) info status 
VM status: running

(qemu) qemu-kvm: iSCSI: NOP timeout. Reconnecting...
(qemu) info block
my: /home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 (qcow2)
    Attached to:      image1
    Cache mode:       writeback

drive2: /home/kvm_autotest_root/images/data.qcow2 (qcow2)
    Attached to:      data-disk1
    Cache mode:       writeback, ignore flushes

drive3: json:{"driver": "raw", "file": {"lun": 0, "portal": "10.66.10.36", "initiator-name": "iqn.1994-05.com.redhat:d399855229c", "driver": "iscsi", "transport": "tcp", "target": "iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3"}} (raw)
    Attached to:      data-disk2
    Cache mode:       writeback, ignore flushes
(qemu) info status 
VM status: running

# telnet localhost 4444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 4}, "package": "qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71"}, "capabilities": ["oob"]}}

{"execute": "qmp_capabilities"}

{"return": {}}

Comment 11 errata-xmlrpc 2019-11-06 07:15:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.