Bug 1711155
| Summary: | Vm gets stuck when iscsi disk is not accessible | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Han Han <hhan> |
| Component: | qemu-kvm | Assignee: | Ademar Reis <areis> |
| Status: | CLOSED ERRATA | QA Contact: | Xueqiang Wei <xuwei> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | --- | CC: | coli, ddepaula, dyuan, knoel, mrezanin, timao, virt-maint, xuwei, xuzhang |
| Target Milestone: | rc | Flags: | knoel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-11-06 07:15:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
wontfix in RHEL7 due to capacity. Moving to RHEL8-AV (where this seems to be fixed already, will update it later) (In reply to Han Han from comment #0) > BTW, all works well on > libvirt-5.3.0-1.module+el8.1.0+3164+94495c71.x86_64 > qemu-kvm-4.0.0-1.module+el8.1.0+3216+7947b8cc.x86_64 > libiscsi-1.18.0-6.module+el8+2603+0a5231c4.x86_64 Marking the BZ POST because the fixes are in QEMU-4.0. We need QA_ACK+ for this. Reproduced it on rhel7.7 with below steps:
Versions:
Host:
kernel-3.10.0-1053.el7.x86_64
qemu-kvm-rhev-2.12.0-32.el7
libiscsi-1.9.0-7.el7.x86_64
1. Prepare a iscsi server via targetcli
2. Start a vm with iscsi disk, set rw error policies to ignore
/usr/libexec/qemu-kvm \
-S \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine q35 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1 \
-device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie_root_port_1,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie_root_port_2,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_yvw268de/monitor-qmpmonitor1-20181017-004217-U4Tik3JV,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_yvw268de/monitor-catch_monitor-20181017-004217-U4Tik3JV,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idaVJ26s \
-chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_yvw268de/serial-serial0-20181017-004217-U4Tik3JV,server,nowait \
-device isa-serial,chardev=serial_id_serial0 \
-chardev socket,id=seabioslog_id_20181017-004217-U4Tik3JV,path=/var/tmp/avocado_yvw268de/seabios-20181017-004217-U4Tik3JV,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20181017-004217-U4Tik3JV,iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-5,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-6,addr=0x0,iothread=iothread0 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device scsi-hd,drive=my,id=image1,bootindex=0 \
-device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
-device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \
-blockdev driver=qcow2,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/data.qcow2,node-name=drive2,file.driver=file \
-device scsi-hd,drive=drive2,id=data-disk1,bus=scsi1.0 \
-device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
-device virtio-net-pci,mac=9a:82:83:84:85:86,id=idWBc2X6,vectors=4,netdev=idX17Mug,bus=pcie.0-root-port-7,addr=0x0 \
-netdev tap,id=idX17Mug,vhost=on \
-m 8G \
-smp 12,maxcpus=12,cores=6,threads=1,sockets=2 \
-cpu 'Opteron_G5',+kvm_pv_unhalt \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot order=cdn,once=d,menu=off,strict=off \
-enable-kvm \
-monitor stdio \
-qmp tcp:0:4444,server,nowait \
-blockdev driver=raw,file.driver=iscsi,file.transport=tcp,file.portal=10.66.10.36,file.initiator-name=iqn.1994-05.com.redhat:d399855229c,file.target=iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3,file.lun=0,cache.direct=off,cache.no-flush=on,node-name=drive3 \
-device scsi-block,drive=drive3,id=data-disk2,bus=scsi1.0,werror=ignore,rerror=ignore \
3. disable access iscsi server
# iptables -A OUTPUT -s 10.73.196.59 -p tcp --dport 3260 -j REJECT
After step 3, the vm will hang. Moreover, no response from qmp.
(qemu) qemu-kvm: iSCSI: NOP timeout. Reconnecting...
# telnet localhost 4444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"execute": "qmp_capabilities"}
no return.
Retested on rhel8.1.0, not hit this issue, So set status to VERIFIED.
Versions:
Host:
kernel-4.18.0-85.el8.x86_64
qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71
libiscsi-1.18.0-6.module+el8.1.0+3258+4c45705b.x86_64
Guest:
kernel-4.18.0-85.el8.x86_64
After step 3, guest doesn't hang and it has response from qmp monitor.
(qemu) info block
my: /home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 (qcow2)
Attached to: image1
Cache mode: writeback
drive2: /home/kvm_autotest_root/images/data.qcow2 (qcow2)
Attached to: data-disk1
Cache mode: writeback, ignore flushes
drive3: json:{"driver": "raw", "file": {"lun": 0, "portal": "10.66.10.36", "initiator-name": "iqn.1994-05.com.redhat:d399855229c", "driver": "iscsi", "transport": "tcp", "target": "iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3"}} (raw)
Attached to: data-disk2
Cache mode: writeback, ignore flushes
(qemu) info status
VM status: running
(qemu) qemu-kvm: iSCSI: NOP timeout. Reconnecting...
(qemu) info block
my: /home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 (qcow2)
Attached to: image1
Cache mode: writeback
drive2: /home/kvm_autotest_root/images/data.qcow2 (qcow2)
Attached to: data-disk1
Cache mode: writeback, ignore flushes
drive3: json:{"driver": "raw", "file": {"lun": 0, "portal": "10.66.10.36", "initiator-name": "iqn.1994-05.com.redhat:d399855229c", "driver": "iscsi", "transport": "tcp", "target": "iqn.2003-01.org.linux-iscsi.dhcp-10-36.x8664:sn.6131cd5db7bb-3"}} (raw)
Attached to: data-disk2
Cache mode: writeback, ignore flushes
(qemu) info status
VM status: running
# telnet localhost 4444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 4}, "package": "qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723 |
Description of problem: As subject Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.12.0-28.el7.x86_64 libiscsi-1.9.0-7.el7.x86_64 libvirt-4.5.0-17.virtcov.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a iscsi server via targetcli 2. Start a vm with iscsi disk as the 2nd disk, set rw error policies to ignore Disk xml: <disk type="file" device="disk"> <driver name="qemu" type="qcow2" /> <source file="/var/lib/libvirt/images/pc.qcow2" /> <backingStore /> <target dev="sda" bus="scsi" /> <alias name="scsi0-0-0-0" /> <address type="drive" controller="0" bus="0" target="0" unit="0" /> </disk>-- NODE -- <disk type="network" device="disk"> <driver name="qemu" type="raw" error_policy="ignore" rerror_policy="ignore" /> <source protocol="iscsi" name="iqn.2019-04.org.linux-iscsi-noauth/0"> <host name="localhost4" port="3260" /> </source> <target dev="sdb" bus="scsi" /> <alias name="scsi0-0-0-1" /> <address type="drive" controller="0" bus="0" target="0" unit="1" /> </disk> Qemu cmdline: /usr/libexec/qemu-kvm -name guest=pc,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-pc/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid bd0c6823-3472-40a5-a17c-7e550889bce2 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-ehci1,id=usb1,bus=pci.0,addr=0x3.0x7 -device qemu-xhci,id=usb2,bus=pci.0,addr=0x9 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -device ahci,id=sata0,bus=pci.0,addr=0xe -device ahci,id=sata1,bus=pci.0,addr=0xc -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/pc.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file.driver=iscsi,file.portal=localhost4:3260,file.target=iqn.2019-04.org.linux-iscsi-noauth,file.lun=0,file.transport=tcp,format=raw,if=none,id=drive-scsi0-0-0-1,werror=ignore,rerror=ignore -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f8:14:ca,bus=pci.0,addr=0xa -netdev tap,fd=30,id=hostnet1,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:86:2f:66,bus=pci.0,addr=0xb -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=32,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 3. Destory iscsi service. Then the vm will hang but the status is running. Morever, no resonse from qmp: 3.1 Stop libvirtd service 3.2 Try to connect qmp monitor # nc -U /var/lib/libvirt/qemu/domain-11-pc/monitor.sock { "execute": "qmp_capabilities" } no return. Actual results: As above Expected results: Vm doesn't get stuck. Additional info: From the backtrace, I find the main process of qemu-kvm was hang at iscsi_reconnect: (gdb) bt #0 0x00007fa74010e91e in iscsi_reconnect (old_iscsi=old_iscsi@entry=0x559a20244000) at lib/connect.c:300 #1 0x00007fa74011a93c in iscsi_service (iscsi=<optimized out>) at lib/socket.c:684 #2 0x00007fa74011a93c in iscsi_service (iscsi=iscsi@entry=0x559a20244000, revents=revents@entry=1) at lib/socket.c:774 #3 0x0000559a1e6d8bcf in iscsi_process_read (arg=0x559a200210e0) at block/iscsi.c:373 #4 0x0000559a1e744f68 in aio_dispatch_handlers (ctx=ctx@entry=0x559a1ff737c0) at util/aio-posix.c:410 #5 0x0000559a1e7457f8 in aio_dispatch (ctx=0x559a1ff737c0) at util/aio-posix.c:441 #6 0x0000559a1e74261e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #7 0x00007fa741930049 in g_main_context_dispatch (context=0x559a1ff8cc60) at gmain.c:3175 #8 0x00007fa741930049 in g_main_context_dispatch (context=context@entry=0x559a1ff8cc60) at gmain.c:3828 #9 0x0000559a1e744ae7 in main_loop_wait () at util/main-loop.c:215 #10 0x0000559a1e744ae7 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #11 0x0000559a1e744ae7 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 #12 0x0000559a1e3e5107 in main () at vl.c:1964 #13 0x0000559a1e3e5107 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4782 in the function iscsi_reconnect, libiscsi will try to connect iscsi server again and agagin: 245 try_again: 246 247 iscsi = iscsi_create_context(old_iscsi->initiator_name); 248 if (!iscsi) { 249 ISCSI_LOG(old_iscsi, 2, "failed to create new context for reconnection"); 250 return -1; ...... 296 ISCSI_LOG(old_iscsi, 1, "reconnect try %d failed, waiting %d seconds", retry, backoff); 297 iscsi_destroy_context(iscsi); 298 sleep(backoff); 299 retry++; 300 goto try_again; It seems that libiscsi reconnect function should work as aync way or qemu shouldn't invoke that function in main process... BTW, all works well on libvirt-5.3.0-1.module+el8.1.0+3164+94495c71.x86_64 qemu-kvm-4.0.0-1.module+el8.1.0+3216+7947b8cc.x86_64 libiscsi-1.18.0-6.module+el8+2603+0a5231c4.x86_64