Description of problem: As subject Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.12.0-28.el7.x86_64 libiscsi-1.9.0-7.el7.x86_64 libvirt-4.5.0-17.virtcov.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a iscsi server via targetcli 2. Start a vm with iscsi disk as the 2nd disk, set rw error policies to ignore Disk xml: <disk type="file" device="disk"> <driver name="qemu" type="qcow2" /> <source file="/var/lib/libvirt/images/pc.qcow2" /> <backingStore /> <target dev="sda" bus="scsi" /> <alias name="scsi0-0-0-0" /> <address type="drive" controller="0" bus="0" target="0" unit="0" /> </disk>-- NODE -- <disk type="network" device="disk"> <driver name="qemu" type="raw" error_policy="ignore" rerror_policy="ignore" /> <source protocol="iscsi" name="iqn.2019-04.org.linux-iscsi-noauth/0"> <host name="localhost4" port="3260" /> </source> <target dev="sdb" bus="scsi" /> <alias name="scsi0-0-0-1" /> <address type="drive" controller="0" bus="0" target="0" unit="1" /> </disk> Qemu cmdline: /usr/libexec/qemu-kvm -name guest=pc,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-pc/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid bd0c6823-3472-40a5-a17c-7e550889bce2 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-ehci1,id=usb1,bus=pci.0,addr=0x3.0x7 -device qemu-xhci,id=usb2,bus=pci.0,addr=0x9 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -device ahci,id=sata0,bus=pci.0,addr=0xe -device ahci,id=sata1,bus=pci.0,addr=0xc -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/pc.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file.driver=iscsi,file.portal=localhost4:3260,file.target=iqn.2019-04.org.linux-iscsi-noauth,file.lun=0,file.transport=tcp,format=raw,if=none,id=drive-scsi0-0-0-1,werror=ignore,rerror=ignore -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f8:14:ca,bus=pci.0,addr=0xa -netdev tap,fd=30,id=hostnet1,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:86:2f:66,bus=pci.0,addr=0xb -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=32,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 3. Destory iscsi service. Then the vm will hang but the status is running. Morever, no resonse from qmp: 3.1 Stop libvirtd service 3.2 Try to connect qmp monitor # nc -U /var/lib/libvirt/qemu/domain-11-pc/monitor.sock { "execute": "qmp_capabilities" } no return. Actual results: As above Expected results: Vm doesn't get stuck. Additional info: From the backtrace, I find the main process of qemu-kvm was hang at iscsi_reconnect: (gdb) bt #0 0x00007fa74010e91e in iscsi_reconnect (old_iscsi=old_iscsi@entry=0x559a20244000) at lib/connect.c:300 #1 0x00007fa74011a93c in iscsi_service (iscsi=<optimized out>) at lib/socket.c:684 #2 0x00007fa74011a93c in iscsi_service (iscsi=iscsi@entry=0x559a20244000, revents=revents@entry=1) at lib/socket.c:774 #3 0x0000559a1e6d8bcf in iscsi_process_read (arg=0x559a200210e0) at block/iscsi.c:373 #4 0x0000559a1e744f68 in aio_dispatch_handlers (ctx=ctx@entry=0x559a1ff737c0) at util/aio-posix.c:410 #5 0x0000559a1e7457f8 in aio_dispatch (ctx=0x559a1ff737c0) at util/aio-posix.c:441 #6 0x0000559a1e74261e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #7 0x00007fa741930049 in g_main_context_dispatch (context=0x559a1ff8cc60) at gmain.c:3175 #8 0x00007fa741930049 in g_main_context_dispatch (context=context@entry=0x559a1ff8cc60) at gmain.c:3828 #9 0x0000559a1e744ae7 in main_loop_wait () at util/main-loop.c:215 #10 0x0000559a1e744ae7 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #11 0x0000559a1e744ae7 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 #12 0x0000559a1e3e5107 in main () at vl.c:1964 #13 0x0000559a1e3e5107 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4782 in the function iscsi_reconnect, libiscsi will try to connect iscsi server again and agagin: 245 try_again: 246 247 iscsi = iscsi_create_context(old_iscsi->initiator_name); 248 if (!iscsi) { 249 ISCSI_LOG(old_iscsi, 2, "failed to create new context for reconnection"); 250 return -1; ...... 296 ISCSI_LOG(old_iscsi, 1, "reconnect try %d failed, waiting %d seconds", retry, backoff); 297 iscsi_destroy_context(iscsi); 298 sleep(backoff); 299 retry++; 300 goto try_again; It seems that libiscsi reconnect function should work as aync way or qemu shouldn't invoke that function in main process... BTW, all works well on libvirt-5.3.0-1.module+el8.1.0+3164+94495c71.x86_64 qemu-kvm-4.0.0-1.module+el8.1.0+3216+7947b8cc.x86_64 libiscsi-1.18.0-6.module+el8+2603+0a5231c4.x86_64
wontfix in RHEL7 due to capacity. Moving to RHEL8-AV (where this seems to be fixed already, will update it later)
(In reply to Han Han from comment #0) > BTW, all works well on > libvirt-5.3.0-1.module+el8.1.0+3164+94495c71.x86_64 > qemu-kvm-4.0.0-1.module+el8.1.0+3216+7947b8cc.x86_64 > libiscsi-1.18.0-6.module+el8+2603+0a5231c4.x86_64 Marking the BZ POST because the fixes are in QEMU-4.0.
We need QA_ACK+ for this.
Reproduced it on rhel7.7 with below steps: Versions: Host: kernel-3.10.0-1053.el7.x86_64 qemu-kvm-rhev-2.12.0-32.el7 libiscsi-1.9.0-7.el7.x86_64 1. Prepare a iscsi server via targetcli 2. Start a vm with iscsi disk, set rw error policies to ignore /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie_root_port_1,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device pcie-root-port,id=pcie_root_port_2,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_yvw268de/monitor-qmpmonitor1-20181017-004217-U4Tik3JV,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_yvw268de/monitor-catch_monitor-20181017-004217-U4Tik3JV,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idaVJ26s \ -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_yvw268de/serial-serial0-20181017-004217-U4Tik3JV,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20181017-004217-U4Tik3JV,path=/var/tmp/avocado_yvw268de/seabios-20181017-004217-U4Tik3JV,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20181017-004217-U4Tik3JV,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-5,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -object iothread,id=iothread0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-6,addr=0x0,iothread=iothread0 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device scsi-hd,drive=my,id=image1,bootindex=0 \ -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \ -blockdev driver=qcow2,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/data.qcow2,node-name=drive2,file.driver=file \ -device scsi-hd,drive=drive2,id=data-disk1,bus=scsi1.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \ -device virtio-net-pci,mac=9a:82:83:84:85:86,id=idWBc2X6,vectors=4,netdev=idX17Mug,bus=pcie.0-root-port-7,addr=0x0 \ -netdev tap,id=idX17Mug,vhost=on \ -m 8G \ -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 \ -cpu 'Opteron_G5',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=d,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:4444,server,nowait \ -blockdev driver=raw,file.driver=iscsi,file.transport=tcp,file.portal=10.66.10.36,file.initiator-name=iqn.1994-05.com.redhat:d399855229c,file.target=iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3,file.lun=0,cache.direct=off,cache.no-flush=on,node-name=drive3 \ -device scsi-block,drive=drive3,id=data-disk2,bus=scsi1.0,werror=ignore,rerror=ignore \ 3. disable access iscsi server # iptables -A OUTPUT -s 10.73.196.59 -p tcp --dport 3260 -j REJECT After step 3, the vm will hang. Moreover, no response from qmp. (qemu) qemu-kvm: iSCSI: NOP timeout. Reconnecting... # telnet localhost 4444 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. {"execute": "qmp_capabilities"} no return. Retested on rhel8.1.0, not hit this issue, So set status to VERIFIED. Versions: Host: kernel-4.18.0-85.el8.x86_64 qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71 libiscsi-1.18.0-6.module+el8.1.0+3258+4c45705b.x86_64 Guest: kernel-4.18.0-85.el8.x86_64 After step 3, guest doesn't hang and it has response from qmp monitor. (qemu) info block my: /home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 (qcow2) Attached to: image1 Cache mode: writeback drive2: /home/kvm_autotest_root/images/data.qcow2 (qcow2) Attached to: data-disk1 Cache mode: writeback, ignore flushes drive3: json:{"driver": "raw", "file": {"lun": 0, "portal": "10.66.10.36", "initiator-name": "iqn.1994-05.com.redhat:d399855229c", "driver": "iscsi", "transport": "tcp", "target": "iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3"}} (raw) Attached to: data-disk2 Cache mode: writeback, ignore flushes (qemu) info status VM status: running (qemu) qemu-kvm: iSCSI: NOP timeout. Reconnecting... (qemu) info block my: /home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 (qcow2) Attached to: image1 Cache mode: writeback drive2: /home/kvm_autotest_root/images/data.qcow2 (qcow2) Attached to: data-disk1 Cache mode: writeback, ignore flushes drive3: json:{"driver": "raw", "file": {"lun": 0, "portal": "10.66.10.36", "initiator-name": "iqn.1994-05.com.redhat:d399855229c", "driver": "iscsi", "transport": "tcp", "target": "iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3"}} (raw) Attached to: data-disk2 Cache mode: writeback, ignore flushes (qemu) info status VM status: running # telnet localhost 4444 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 4}, "package": "qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities"} {"return": {}}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723