Bug 1435660
Summary: | Windows 2016 got stucked when creating volume on shared vioscsi disk | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jiri Belka <jbelka> | ||||
Component: | qemu-kvm-rhev | Assignee: | Paolo Bonzini <pbonzini> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Xueqiang Wei <xuwei> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.3 | CC: | aliang, chayang, coli, jbelka, juzhang, knoel, lijin, michen, mtessun, pbonzini, phou, virt-maint, wyu, xiagao, xuwei | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-08-24 13:15:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jiri Belka
2017-03-24 13:23:02 UTC
kernel-3.10.0-514.6.1.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64 libiscsi-1.9.0-7.el7.x86_64 virtio-win-1.9.0-3.el7.noarch # cat /var/log/libvirt/qemu/vm2-w2k16-X.log 2017-03-24 09:57:55.865+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.5 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-02-10-03:02:29, x86-020.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7_3.6), hostname: dell-r210ii-03.rhev.lab.eng.brq.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=vm2-w2k16-X,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-16-vm2-w2k16-X/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu SandyBridge,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -m size=4194304k,slots=16,maxmem=16777216k -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -uuid a2ff2496-d8c9-41f6-8337-35c8e3689a5e -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.3-0.10.el7,serial=4C4C4544-0058-3410-8058-C3C04F38354A,uuid=a2ff2496-d8c9-41f6-8337-35c8e3689a5e' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-16-vm2-w2k16-X/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-03-24T09:57:55,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot menu=on,splash-time=10000,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/mnt/data.rhev.lab.eng.brq.redhat.com:_data_iso_shared/0c78b4d6-ba00-4d3e-9f9f-65c7d5899d71/images/11111111-1111-1111-1111-111111111111/en_windows_server_2016_x64_dvd_9327751.iso,format=raw,if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/rhev/data-center/mnt/data.rhev.lab.eng.brq.redhat.com:_data_iso_shared/0c78b4d6-ba00-4d3e-9f9f-65c7d5899d71/images/11111111-1111-1111-1111-111111111111/virtio-win-1.9.0-3.el7-amd64.vfd,format=raw,if=none,id=drive-fdc0-0-0,readonly=on -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-00000000017c/d93e6782-f200-4e5e-9713-2a53ceca3c49/images/c649791a-ac9e-45e0-89d7-16c882ad2f2e/c91cc2ec-cdaf-4f58-8abf-211d37aeaace,format=raw,if=none,id=drive-virtio-disk0,serial=c649791a-ac9e-45e0-89d7-16c882ad2f2e,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:51,bus=pci.0,addr=0x3,bootindex=3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/a2ff2496-d8c9-41f6-8337-35c8e3689a5e.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/a2ff2496-d8c9-41f6-8337-35c8e3689a5e.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice tls-port=5900,addr=10.34.63.222,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on 2017-03-24T09:57:56.032400Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2017-03-24T09:57:56.032524Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config main_channel_link: add main channel client main_channel_handle_parsed: net test: latency 39.291000 ms, bitrate 6622281 bps (6.315499 Mbps) LOW BANDWIDTH inputs_connect: inputs channel client create red_dispatcher_set_cursor_peer: red_channel_client_disconnect: rcc=0x7fa34bcb8000 (channel=0x7fa349cccc00 type=2 id=0) red_channel_client_disconnect: rcc=0x7fa34d269000 (channel=0x7fa349d86000 type=3 id=0) red_channel_client_disconnect: rcc=0x7fa34d26e000 (channel=0x7fa349d71080 type=4 id=0) red_channel_client_disconnect: rcc=0x7fa34c924000 (channel=0x7fa349d7e000 type=1 id=0) main_channel_client_on_disconnect: rcc=0x7fa34c924000 red_client_destroy: destroy client 0x7fa349c45700 with #channels=4 red_dispatcher_disconnect_cursor_peer: red_dispatcher_disconnect_display_peer: main_channel_link: add main channel client main_channel_handle_parsed: net test: latency 23.343000 ms, bitrate 4249112 bps (4.052269 Mbps) LOW BANDWIDTH red_dispatcher_set_cursor_peer: inputs_connect: inputs channel client create main_channel_handle_parsed: agent start 2017-03-24T11:24:42.359200Z qemu-kvm: virtio-serial-bus: Unexpected port id 736681576 for device virtio-serial0.0 2017-03-24T11:24:45.382932Z qemu-kvm: virtio-serial-bus: Unexpected port id 1185792 for device virtio-serial0.0 2017-03-24T11:25:06.623340Z qemu-kvm: virtio-serial-bus: Unexpected port id 3528894800 for device virtio-serial0.0 2017-03-24T11:25:06.624055Z qemu-kvm: virtio-serial-bus: Unexpected port id 739323064 for device virtio-serial0.0 2017-03-24T11:25:14.019747Z qemu-kvm: virtio-serial-bus: Unexpected port id 3528894800 for device virtio-serial0.0 2017-03-24T11:25:57.144946Z qemu-kvm: virtio-serial-bus: Unexpected port id 3491010928 for device virtio-serial0.0 2017-03-24T11:27:26.876178Z qemu-kvm: virtio-serial-bus: Unexpected port id 0 for device virtio-serial0.0 2017-03-24T11:27:29.913275Z qemu-kvm: virtio-serial-bus: Unexpected port id 3552624224 for device virtio-serial0.0 2017-03-24T11:27:32.784350Z qemu-kvm: virtio-serial-bus: Unexpected port id 739049208 for device virtio-serial0.0 2017-03-24T11:27:34.025448Z qemu-kvm: virtio-serial-bus: Guest failure in adding device virtio-serial0.0 main_channel_handle_parsed: agent start main_channel_handle_parsed: agent start main_channel_handle_parsed: agent start main_channel_handle_parsed: agent start main_channel_handle_parsed: agent start red_channel_client_disconnect: rcc=0x7fa34bdca000 (channel=0x7fa349cccc00 type=2 id=0) red_channel_client_disconnect: rcc=0x7fa34aacd000 (channel=0x7fa349d71080 type=4 id=0) red_channel_client_disconnect: rcc=0x7fa34aabe000 (channel=0x7fa349d86000 type=3 id=0) red_channel_client_disconnect: rcc=0x7fa34ab1a000 (channel=0x7fa349d7e000 type=1 id=0) main_channel_client_on_disconnect: rcc=0x7fa34ab1a000 red_client_destroy: destroy client 0x7fa34dcbbb80 with #channels=4 red_dispatcher_disconnect_cursor_peer: red_dispatcher_disconnect_display_peer: 2017-03-24T13:17:49.736414Z qemu-kvm: terminating on signal 15 from pid 1702 2017-03-24 13:18:14.944+0000: shutting down It has happened again. Steps in W2k16: - in Server Manager ensure you see two VMs (I added them to domain and logged in first VM as domain admin, then I added second VM into Server Manager - i attached iscsi direct-lun via 'attach-device' in virsh <disk type='network' device='lun' sgio='unfiltered' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/> <source protocol='iscsi' name='iqn.1992-05.com.emc:ckm001201002300000-5-vnxe/5'> <host name='10.34.63.200' port='3260'/> </source> <backingStore/> <target dev='sda' bus='scsi'/> <shareable/> <alias name='scsi0-0-0-0'/> </disk> - bring disk online, so it's up on both nodes in SM - on first VM do New Volume on the disk (it's getting stucked in Formating Volume) Created attachment 1266059 [details]
screenshot
I tried to reproduce this bug with LIO iscsi target, I cannot reproduce it, my reproduced steps as follows: 1. Boot 2 VMs on qemu level with qemu initiator: 1). /usr/libexec/qemu-kvm -name guest=peixiu_vm4,debug-threads=on -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid b31d5b11-5fd6-44c6-a3ad-365cb2e146eb -rtc base=2017-03-28T03:16:50,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/a41ee8e2-4e1c-4549-865d-3c9cf1bcc727/c6f00513-2812-4fbd-ab28-b6aba6079280/images/b07818ad-815c-4f5f-a13a-9e8392fca2db/17ce52a2-97a8-44c6-8115-31bfbc50cb87,format=raw,if=none,id=drive-virtio-disk0,serial=b07818ad-815c-4f5f-a13a-9e8392fca2db,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/rhev/data-center/a41ee8e2-4e1c-4549-865d-3c9cf1bcc727/c6f00513-2812-4fbd-ab28-b6aba6079280/images/b6130e3b-8175-4fad-b513-0d2433d8af88/3af3c62c-00a5-40ef-93cc-735452fc822a,format=raw,if=none,id=drive-virtio-disk1,serial=b6130e3b-8175-4fad-b513-0d2433d8af88,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:54,bus=pci.0,addr=0x3 -vnc :2 -monitor stdio -drive file=iscsi://10.66.4.129/iqn.2016-06.local.server:sas/0,format=raw,if=none,id=drive-scsi0-0-0-1,cache=none,werror=stop,rerror=stop,aio=native -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 2) /usr/libexec/qemu-kvm -name guest=peixiu_vm2,debug-threads=on -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid d58a72a1-6f72-4a8f-8578-c80dd9cf04dc -rtc base=2017-03-28T02:22:38,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=iscsi://10.66.4.129/iqn.2016-06.local.server:sas/0,format=raw,if=none,id=drive-scsi0-0-0-1,cache=none,werror=stop,rerror=stop,aio=native -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:52,bus=pci.0,addr=0x3 -vnc :3 -monitor stdio -drive file=/rhev/data-center/a41ee8e2-4e1c-4549-865d-3c9cf1bcc727/c6f00513-2812-4fbd-ab28-b6aba6079280/images/e09ee7e5-1d0a-48d7-bd07-e7087cfe3635/f16e06b7-e533-4a2c-8e27-b7918137782b,format=raw,if=none,id=drive-virtio-disk1,serial=b6130e3b-8175-4fad-b513-0d2433d8af88,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 2. Install the vioscsi driver. 3. In vm disk management, online the disk and create new volume Used version: kernel-3.10.0-514.6.1.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64 virtio-win-1.9.0-3.el7.noarch Additional info: Tried on rhel7.4 with LIO iscsi target, it's also normal. Best Regards~ Peixiu Hou (In reply to Peixiu Hou from comment #6) > I tried to reproduce this bug with LIO iscsi target, I cannot reproduce it, > my reproduced steps as follows: > > 1. Boot 2 VMs on qemu level with qemu initiator: > > 1). /usr/libexec/qemu-kvm -name guest=peixiu_vm4,debug-threads=on -machine > pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Nehalem -m ^^^^^^^^^^^^^^ different qemu configuration... vs cpu SandyBridge,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff ... -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.3-0.10.el7,serial=4C4C4544-0058-3410-8058- C3C04F38354A,uuid=a2ff2496-d8c9-41f6-8337-35c8e3689a5e' my env was RHV 4.1, yours seems just "plain" qemu from EL7. I also used this EMC VMX. I tried to get crash dump from Windows but it seems qemu process is stucked as well. I tried 'inject-nmi' via virsh, I saw blue screen in Windows with notification about NMI but after that qemu got stucked. I used en_windows_server_2016_x64_dvd_9327751.iso for installation. virsh # qemu-monitor-command 23 --hmp 'info block' error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockIoTune) virsh # inject-nmi 23 error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockIoTune) It just this: - reset disk (it will have no mbr/gpt - undefined) - make the disk online on both - new volume in Server Manager -> File and Storage Services -> Disks - when the VM is frozen, send 'inject-nmi' via virsh - the blue screen will show '0% complete' for ages and it changes to '?' in Admin Portal... 2017-04-11 17:16:56,026+0200 WARN (jsonrpc/6) [virt.vm] (vmId='a2ff2496-d8c9-41f6-8337-35c8e3689a5e') monitor became unresponsive (command timeout, age=61.1299999999) (vm:5013) and the status in virsh is 'paused' I also tried following without success: - 'qemu-monitor-command $domain --hmp 'dump-guest-memory /var/log/core/vmdump' in virsh # grep "doesn't" /var/log/messages Apr 11 16:01:13 localhost journal: host doesn't support hyperv 'relaxed' feature Apr 11 16:01:13 localhost journal: host doesn't support hyperv 'vapic' feature Apr 11 16:04:36 localhost journal: host doesn't support hyperv 'relaxed' feature Apr 11 16:04:36 localhost journal: host doesn't support hyperv 'vapic' feature <cpu> <arch>x86_64</arch> <model>SandyBridge</model> <vendor>Intel</vendor> <topology sockets='1' cores='4' threads='2'/> <feature name='vme'/> <feature name='ds'/> <feature name='acpi'/> <feature name='ss'/> <feature name='ht'/> <feature name='tm'/> <feature name='pbe'/> <feature name='dtes64'/> <feature name='monitor'/> <feature name='ds_cpl'/> <feature name='vmx'/> <feature name='smx'/> <feature name='est'/> <feature name='tm2'/> <feature name='xtpr'/> <feature name='pdcm'/> <feature name='pcid'/> <feature name='osxsave'/> <feature name='arat'/> <feature name='xsaveopt'/> <feature name='invtsc'/> <pages unit='KiB' size='4'/> <pages unit='KiB' size='2048'/> </cpu> any other info I could provide? Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it in D state? (In reply to Paolo Bonzini from comment #16) > Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it > in D state? ping (In reply to Paolo Bonzini from comment #16) > Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it > in D state? What is 'D' state please? Last time I tried to get more info I found the VM was paused by some component (systemd or vdsm?)... I'll try to get more info soon. "D" state means that "ps" shows a "D" for the process state, such as this: root 29965 0.0 0.0 42632 3240 pts/2 D+ 09:53 0:00 /sbin/mount.nfs ^^ When this happens you cannot use "gdb" to show the process's stack backtrace. If the process is not "D" state, gdb works. (In reply to Ademar Reis from comment #17) > (In reply to Paolo Bonzini from comment #16) > > Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it > > in D state? > > ping restoring needinfo, as Paolo is still waiting for a backtrace. The VM is trying to reconnect to the disk: Thread 1 (Thread 0x7f3f7c8c6c80 (LWP 5687)): #0 0x00007f3f7230d66d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 No locals. #1 0x00007f3f7230d504 in __sleep (seconds=0, seconds@entry=5) at ../sysdeps/unix/sysv/linux/sleep.c:137 ts = {tv_sec = 2, tv_nsec = 554646223} set = {__val = {65536, 0 <repeats 15 times>}} oset = {__val = {268444224, 139910641029281, 139910888208128, 139910641043794, 4294967296, 139910845870080, 139910845870080, 139910690283872, 0, 139910641029281, 139910690283872, 139910641076894, 14429289160618278914, 0, 0, 68719476740}} result = <optimized out> #2 0x00007f3f7c0f0ad2 in iscsi_reconnect (old_iscsi=old_iscsi@entry=0x7f3f81aec000) at lib/connect.c:242 iscsi = <optimized out> retry = 0 If the bug happens as soon as the second VM is started, or very little after, it could be a configuration issue in the iSCSI target. Is it possible to have access to the storage system? (In reply to Paolo Bonzini from comment #23) > The VM is trying to reconnect to the disk: > > Thread 1 (Thread 0x7f3f7c8c6c80 (LWP 5687)): > #0 0x00007f3f7230d66d in nanosleep () at > ../sysdeps/unix/syscall-template.S:81 > No locals. > #1 0x00007f3f7230d504 in __sleep (seconds=0, seconds@entry=5) at > ../sysdeps/unix/sysv/linux/sleep.c:137 > ts = {tv_sec = 2, tv_nsec = 554646223} > set = {__val = {65536, 0 <repeats 15 times>}} > oset = {__val = {268444224, 139910641029281, 139910888208128, > 139910641043794, 4294967296, 139910845870080, 139910845870080, > 139910690283872, 0, 139910641029281, 139910690283872, 139910641076894, > 14429289160618278914, 0, 0, 68719476740}} > result = <optimized out> > #2 0x00007f3f7c0f0ad2 in iscsi_reconnect > (old_iscsi=old_iscsi@entry=0x7f3f81aec000) at lib/connect.c:242 > iscsi = <optimized out> > retry = 0 > > If the bug happens as soon as the second VM is started, or very little > after, it could be a configuration issue in the iSCSI target. Is it > possible to have access to the storage system? I don't have that env - after one year - and when I tried similar flow on another env I did not face such issue anymore. |