Bug 1435660

Summary:

Windows 2016 got stucked when creating volume on shared vioscsi disk

Product:

Red Hat Enterprise Linux 7

Reporter:

Jiri Belka <jbelka>

Component:

qemu-kvm-rhev

Assignee:

Paolo Bonzini <pbonzini>

Status:

CLOSED NOTABUG

QA Contact:

Xueqiang Wei <xuwei>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

7.3

CC:

aliang, chayang, coli, jbelka, juzhang, knoel, lijin, michen, mtessun, pbonzini, phou, virt-maint, wyu, xiagao, xuwei

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-08-24 13:15:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
screenshot	none

Description Jiri Belka 2017-03-24 13:23:02 UTC

Description of problem:

Windows 2016 VM got stucked when I was creating volume on a shared SAS disk utilizing vioscsi driver, the disk was added via qemu iscsi initiator from EMC VNX.

PS C:\Users\ad-w2k12r2> Get-WmiObject Win32_PnPSignedDriver| select devicename, driverversion, driverdate | where {$_.devicename -like "*scsi*" }

devicename                                  driverversion   driverdate
----------                                  -------------   ----------
Red Hat VirtIO SCSI controller              62.73.104.12600 20160811000000.******+***
Red Hat VirtIO SCSI pass-through controller 62.73.104.12400 20160729000000.******+***
iscsi

main_channel_handle_parsed: net test: latency 23.343000 ms, bitrate 4249112 bps (4.052269 Mbps) LOW BANDWIDTH
red_dispatcher_set_cursor_peer: 
inputs_connect: inputs channel client create
main_channel_handle_parsed: agent start
2017-03-24T11:24:42.359200Z qemu-kvm: virtio-serial-bus: Unexpected port id 736681576 for device virtio-serial0.0
2017-03-24T11:24:45.382932Z qemu-kvm: virtio-serial-bus: Unexpected port id 1185792 for device virtio-serial0.0
2017-03-24T11:25:06.623340Z qemu-kvm: virtio-serial-bus: Unexpected port id 3528894800 for device virtio-serial0.0
2017-03-24T11:25:06.624055Z qemu-kvm: virtio-serial-bus: Unexpected port id 739323064 for device virtio-serial0.0
2017-03-24T11:25:14.019747Z qemu-kvm: virtio-serial-bus: Unexpected port id 3528894800 for device virtio-serial0.0
2017-03-24T11:25:57.144946Z qemu-kvm: virtio-serial-bus: Unexpected port id 3491010928 for device virtio-serial0.0
2017-03-24T11:27:26.876178Z qemu-kvm: virtio-serial-bus: Unexpected port id 0 for device virtio-serial0.0
2017-03-24T11:27:29.913275Z qemu-kvm: virtio-serial-bus: Unexpected port id 3552624224 for device virtio-serial0.0
2017-03-24T11:27:32.784350Z qemu-kvm: virtio-serial-bus: Unexpected port id 739049208 for device virtio-serial0.0
2017-03-24T11:27:34.025448Z qemu-kvm: virtio-serial-bus: Guest failure in adding device virtio-serial0.0
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
red_channel_client_disconnect: rcc=0x7fa34bdca000 (channel=0x7fa349cccc00 type=2 id=0)
red_channel_client_disconnect: rcc=0x7fa34aacd000 (channel=0x7fa349d71080 type=4 id=0)
red_channel_client_disconnect: rcc=0x7fa34aabe000 (channel=0x7fa349d86000 type=3 id=0)
red_channel_client_disconnect: rcc=0x7fa34ab1a000 (channel=0x7fa349d7e000 type=1 id=0)
main_channel_client_on_disconnect: rcc=0x7fa34ab1a000
red_client_destroy: destroy client 0x7fa34dcbbb80 with #channels=4
red_dispatcher_disconnect_cursor_peer: 
red_dispatcher_disconnect_display_peer: 

Version-Release number of selected component (if applicable):
1.9.0-3

How reproducible:
just happened

Steps to Reproduce:
1. windows 2016 on vioblock installed
2. attach iscsi direct-lun shared (unfiltered) disk via qemu initiator
3. attach the disk to two VMs
4. bring online on both nodes
5. create volume on VM 1

Actual results:
VM got stucked

Expected results:
should work

Additional info:

Comment 2 Jiri Belka 2017-03-24 13:25:25 UTC

kernel-3.10.0-514.6.1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64
libiscsi-1.9.0-7.el7.x86_64
virtio-win-1.9.0-3.el7.noarch

Comment 3 Jiri Belka 2017-03-24 13:29:57 UTC

# cat /var/log/libvirt/qemu/vm2-w2k16-X.log 
2017-03-24 09:57:55.865+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.5 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-02-10-03:02:29, x86-020.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7_3.6), hostname: dell-r210ii-03.rhev.lab.eng.brq.redhat.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=vm2-w2k16-X,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-16-vm2-w2k16-X/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu SandyBridge,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -m size=4194304k,slots=16,maxmem=16777216k -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -uuid a2ff2496-d8c9-41f6-8337-35c8e3689a5e -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.3-0.10.el7,serial=4C4C4544-0058-3410-8058-C3C04F38354A,uuid=a2ff2496-d8c9-41f6-8337-35c8e3689a5e' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-16-vm2-w2k16-X/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-03-24T09:57:55,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot menu=on,splash-time=10000,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/mnt/data.rhev.lab.eng.brq.redhat.com:_data_iso_shared/0c78b4d6-ba00-4d3e-9f9f-65c7d5899d71/images/11111111-1111-1111-1111-111111111111/en_windows_server_2016_x64_dvd_9327751.iso,format=raw,if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/rhev/data-center/mnt/data.rhev.lab.eng.brq.redhat.com:_data_iso_shared/0c78b4d6-ba00-4d3e-9f9f-65c7d5899d71/images/11111111-1111-1111-1111-111111111111/virtio-win-1.9.0-3.el7-amd64.vfd,format=raw,if=none,id=drive-fdc0-0-0,readonly=on -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-00000000017c/d93e6782-f200-4e5e-9713-2a53ceca3c49/images/c649791a-ac9e-45e0-89d7-16c882ad2f2e/c91cc2ec-cdaf-4f58-8abf-211d37aeaace,format=raw,if=none,id=drive-virtio-disk0,serial=c649791a-ac9e-45e0-89d7-16c882ad2f2e,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:51,bus=pci.0,addr=0x3,bootindex=3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/a2ff2496-d8c9-41f6-8337-35c8e3689a5e.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/a2ff2496-d8c9-41f6-8337-35c8e3689a5e.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice tls-port=5900,addr=10.34.63.222,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
2017-03-24T09:57:56.032400Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2017-03-24T09:57:56.032524Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 39.291000 ms, bitrate 6622281 bps (6.315499 Mbps) LOW BANDWIDTH
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
red_channel_client_disconnect: rcc=0x7fa34bcb8000 (channel=0x7fa349cccc00 type=2 id=0)
red_channel_client_disconnect: rcc=0x7fa34d269000 (channel=0x7fa349d86000 type=3 id=0)
red_channel_client_disconnect: rcc=0x7fa34d26e000 (channel=0x7fa349d71080 type=4 id=0)
red_channel_client_disconnect: rcc=0x7fa34c924000 (channel=0x7fa349d7e000 type=1 id=0)
main_channel_client_on_disconnect: rcc=0x7fa34c924000
red_client_destroy: destroy client 0x7fa349c45700 with #channels=4
red_dispatcher_disconnect_cursor_peer: 
red_dispatcher_disconnect_display_peer: 
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 23.343000 ms, bitrate 4249112 bps (4.052269 Mbps) LOW BANDWIDTH
red_dispatcher_set_cursor_peer: 
inputs_connect: inputs channel client create
main_channel_handle_parsed: agent start
2017-03-24T11:24:42.359200Z qemu-kvm: virtio-serial-bus: Unexpected port id 736681576 for device virtio-serial0.0
2017-03-24T11:24:45.382932Z qemu-kvm: virtio-serial-bus: Unexpected port id 1185792 for device virtio-serial0.0
2017-03-24T11:25:06.623340Z qemu-kvm: virtio-serial-bus: Unexpected port id 3528894800 for device virtio-serial0.0
2017-03-24T11:25:06.624055Z qemu-kvm: virtio-serial-bus: Unexpected port id 739323064 for device virtio-serial0.0
2017-03-24T11:25:14.019747Z qemu-kvm: virtio-serial-bus: Unexpected port id 3528894800 for device virtio-serial0.0
2017-03-24T11:25:57.144946Z qemu-kvm: virtio-serial-bus: Unexpected port id 3491010928 for device virtio-serial0.0
2017-03-24T11:27:26.876178Z qemu-kvm: virtio-serial-bus: Unexpected port id 0 for device virtio-serial0.0
2017-03-24T11:27:29.913275Z qemu-kvm: virtio-serial-bus: Unexpected port id 3552624224 for device virtio-serial0.0
2017-03-24T11:27:32.784350Z qemu-kvm: virtio-serial-bus: Unexpected port id 739049208 for device virtio-serial0.0
2017-03-24T11:27:34.025448Z qemu-kvm: virtio-serial-bus: Guest failure in adding device virtio-serial0.0
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
main_channel_handle_parsed: agent start
red_channel_client_disconnect: rcc=0x7fa34bdca000 (channel=0x7fa349cccc00 type=2 id=0)
red_channel_client_disconnect: rcc=0x7fa34aacd000 (channel=0x7fa349d71080 type=4 id=0)
red_channel_client_disconnect: rcc=0x7fa34aabe000 (channel=0x7fa349d86000 type=3 id=0)
red_channel_client_disconnect: rcc=0x7fa34ab1a000 (channel=0x7fa349d7e000 type=1 id=0)
main_channel_client_on_disconnect: rcc=0x7fa34ab1a000
red_client_destroy: destroy client 0x7fa34dcbbb80 with #channels=4
red_dispatcher_disconnect_cursor_peer: 
red_dispatcher_disconnect_display_peer: 
2017-03-24T13:17:49.736414Z qemu-kvm: terminating on signal 15 from pid 1702
2017-03-24 13:18:14.944+0000: shutting down

Comment 4 Jiri Belka 2017-03-24 13:40:34 UTC

It has happened again. Steps in W2k16:

- in Server Manager ensure you see two VMs (I added them to domain and logged in
  first VM as domain admin, then I added second VM into Server Manager
- i attached iscsi direct-lun via 'attach-device' in virsh

    <disk type='network' device='lun' sgio='unfiltered' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source protocol='iscsi' name='iqn.1992-05.com.emc:ckm001201002300000-5-vnxe/5'>
        <host name='10.34.63.200' port='3260'/>
      </source>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <shareable/>
      <alias name='scsi0-0-0-0'/>
    </disk>

- bring disk online, so it's up on both nodes in SM
- on first VM do New Volume on the disk (it's getting stucked in Formating Volume)

Comment 5 Jiri Belka 2017-03-24 13:41:03 UTC

Created attachment 1266059 [details]
screenshot

Comment 6 Peixiu Hou 2017-03-28 05:28:38 UTC

I tried to reproduce this bug with LIO iscsi target, I cannot reproduce it, my reproduced steps as follows:

1. Boot 2 VMs on qemu level with qemu initiator:

1). /usr/libexec/qemu-kvm -name guest=peixiu_vm4,debug-threads=on -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid b31d5b11-5fd6-44c6-a3ad-365cb2e146eb -rtc base=2017-03-28T03:16:50,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/a41ee8e2-4e1c-4549-865d-3c9cf1bcc727/c6f00513-2812-4fbd-ab28-b6aba6079280/images/b07818ad-815c-4f5f-a13a-9e8392fca2db/17ce52a2-97a8-44c6-8115-31bfbc50cb87,format=raw,if=none,id=drive-virtio-disk0,serial=b07818ad-815c-4f5f-a13a-9e8392fca2db,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/rhev/data-center/a41ee8e2-4e1c-4549-865d-3c9cf1bcc727/c6f00513-2812-4fbd-ab28-b6aba6079280/images/b6130e3b-8175-4fad-b513-0d2433d8af88/3af3c62c-00a5-40ef-93cc-735452fc822a,format=raw,if=none,id=drive-virtio-disk1,serial=b6130e3b-8175-4fad-b513-0d2433d8af88,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:54,bus=pci.0,addr=0x3 -vnc :2 -monitor stdio -drive file=iscsi://10.66.4.129/iqn.2016-06.local.server:sas/0,format=raw,if=none,id=drive-scsi0-0-0-1,cache=none,werror=stop,rerror=stop,aio=native -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1

2) /usr/libexec/qemu-kvm -name guest=peixiu_vm2,debug-threads=on -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid d58a72a1-6f72-4a8f-8578-c80dd9cf04dc -rtc base=2017-03-28T02:22:38,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=iscsi://10.66.4.129/iqn.2016-06.local.server:sas/0,format=raw,if=none,id=drive-scsi0-0-0-1,cache=none,werror=stop,rerror=stop,aio=native -device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:52,bus=pci.0,addr=0x3 -vnc :3 -monitor stdio -drive file=/rhev/data-center/a41ee8e2-4e1c-4549-865d-3c9cf1bcc727/c6f00513-2812-4fbd-ab28-b6aba6079280/images/e09ee7e5-1d0a-48d7-bd07-e7087cfe3635/f16e06b7-e533-4a2c-8e27-b7918137782b,format=raw,if=none,id=drive-virtio-disk1,serial=b6130e3b-8175-4fad-b513-0d2433d8af88,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1

2. Install the vioscsi driver.

3. In vm disk management, online the disk and create new volume

Used version:
kernel-3.10.0-514.6.1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64
virtio-win-1.9.0-3.el7.noarch

Additional info:
Tried on rhel7.4 with LIO iscsi target, it's also normal.

Best Regards~
Peixiu Hou

Comment 7 Jiri Belka 2017-03-28 06:27:50 UTC

(In reply to Peixiu Hou from comment #6)
> I tried to reproduce this bug with LIO iscsi target, I cannot reproduce it,
> my reproduced steps as follows:
> 
> 1. Boot 2 VMs on qemu level with qemu initiator:
> 
> 1). /usr/libexec/qemu-kvm -name guest=peixiu_vm4,debug-threads=on -machine
> pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Nehalem -m
                                         ^^^^^^^^^^^^^^

different qemu configuration... vs

  cpu SandyBridge,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff
  ...
  -smbios 'type=1,manufacturer=Red Hat,product=RHEV   
  Hypervisor,version=7.3-0.10.el7,serial=4C4C4544-0058-3410-8058-
  C3C04F38354A,uuid=a2ff2496-d8c9-41f6-8337-35c8e3689a5e'

my env was RHV 4.1, yours seems just "plain" qemu from EL7. I also used this EMC VMX.

Comment 8 Jiri Belka 2017-03-29 11:28:07 UTC

I tried to get crash dump from Windows but it seems qemu process is stucked as well.

I tried 'inject-nmi' via virsh, I saw blue screen in Windows with notification about NMI but after that qemu got stucked.

I used en_windows_server_2016_x64_dvd_9327751.iso for installation.

virsh # qemu-monitor-command 23 --hmp 'info block'
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockIoTune)

virsh # inject-nmi 23
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockIoTune)

Comment 12 Jiri Belka 2017-04-11 15:18:38 UTC

It just this:

- reset disk (it will have no mbr/gpt - undefined)
- make the disk online on both
- new volume in Server Manager -> File and Storage Services -> Disks

Comment 13 Jiri Belka 2017-04-12 07:22:26 UTC

- when the VM is frozen, send 'inject-nmi' via virsh
- the blue screen will show '0% complete' for ages and it changes to '?' in Admin Portal...

2017-04-11 17:16:56,026+0200 WARN  (jsonrpc/6) [virt.vm] (vmId='a2ff2496-d8c9-41f6-8337-35c8e3689a5e') monitor became unresponsive (command timeout, age=61.1299999999) (vm:5013)

  and the status in virsh is 'paused'

I also tried following without success:

- 'qemu-monitor-command $domain --hmp 'dump-guest-memory /var/log/core/vmdump' in virsh

Comment 14 Jiri Belka 2017-04-12 07:26:43 UTC

# grep "doesn't" /var/log/messages
Apr 11 16:01:13 localhost journal: host doesn't support hyperv 'relaxed' feature
Apr 11 16:01:13 localhost journal: host doesn't support hyperv 'vapic' feature
Apr 11 16:04:36 localhost journal: host doesn't support hyperv 'relaxed' feature
Apr 11 16:04:36 localhost journal: host doesn't support hyperv 'vapic' feature


    <cpu>
      <arch>x86_64</arch>
      <model>SandyBridge</model>
      <vendor>Intel</vendor>
      <topology sockets='1' cores='4' threads='2'/>
      <feature name='vme'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='pcid'/>
      <feature name='osxsave'/>
      <feature name='arat'/>
      <feature name='xsaveopt'/>
      <feature name='invtsc'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
    </cpu>


any other info I could provide?

Comment 16 Paolo Bonzini 2017-04-28 12:28:40 UTC

Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it in D state?

Comment 17 Ademar Reis 2017-05-09 14:22:58 UTC

(In reply to Paolo Bonzini from comment #16)
> Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it
> in D state?

ping

Comment 18 Jiri Belka 2017-05-10 07:06:18 UTC

(In reply to Paolo Bonzini from comment #16)
> Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it
> in D state?

What is 'D' state please? Last time I tried to get more info I found the VM was paused by some component (systemd or vdsm?)... I'll try to get more info soon.

Comment 19 Paolo Bonzini 2017-05-10 07:54:14 UTC

"D" state means that "ps" shows a "D" for the process state, such as this:

root     29965  0.0  0.0  42632  3240 pts/2    D+   09:53   0:00 /sbin/mount.nfs
                                               ^^

When this happens you cannot use "gdb" to show the process's stack backtrace.  If the process is not "D" state, gdb works.

Comment 20 Ademar Reis 2017-05-16 17:18:12 UTC

(In reply to Ademar Reis from comment #17)
> (In reply to Paolo Bonzini from comment #16)
> > Jiri, since QEMU is hung, can you grab a stack backtrace with gdb? Or is it
> > in D state?
> 
> ping

restoring needinfo, as Paolo is still waiting for a backtrace.

Comment 23 Paolo Bonzini 2018-08-03 13:06:29 UTC

The VM is trying to reconnect to the disk:

Thread 1 (Thread 0x7f3f7c8c6c80 (LWP 5687)):
#0  0x00007f3f7230d66d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0x00007f3f7230d504 in __sleep (seconds=0, seconds@entry=5) at ../sysdeps/unix/sysv/linux/sleep.c:137
        ts = {tv_sec = 2, tv_nsec = 554646223}
        set = {__val = {65536, 0 <repeats 15 times>}}
        oset = {__val = {268444224, 139910641029281, 139910888208128, 139910641043794, 4294967296, 139910845870080, 139910845870080, 139910690283872, 0, 139910641029281, 139910690283872, 139910641076894, 14429289160618278914, 0, 0, 68719476740}}
        result = <optimized out>
#2  0x00007f3f7c0f0ad2 in iscsi_reconnect (old_iscsi=old_iscsi@entry=0x7f3f81aec000) at lib/connect.c:242
        iscsi = <optimized out>
        retry = 0

If the bug happens as soon as the second VM is started, or very little after, it could be a configuration issue in the iSCSI target.  Is it possible to have access to the storage system?

Comment 24 Jiri Belka 2018-08-13 07:29:29 UTC

(In reply to Paolo Bonzini from comment #23)
> The VM is trying to reconnect to the disk:
> 
> Thread 1 (Thread 0x7f3f7c8c6c80 (LWP 5687)):
> #0  0x00007f3f7230d66d in nanosleep () at
> ../sysdeps/unix/syscall-template.S:81
> No locals.
> #1  0x00007f3f7230d504 in __sleep (seconds=0, seconds@entry=5) at
> ../sysdeps/unix/sysv/linux/sleep.c:137
>         ts = {tv_sec = 2, tv_nsec = 554646223}
>         set = {__val = {65536, 0 <repeats 15 times>}}
>         oset = {__val = {268444224, 139910641029281, 139910888208128,
> 139910641043794, 4294967296, 139910845870080, 139910845870080,
> 139910690283872, 0, 139910641029281, 139910690283872, 139910641076894,
> 14429289160618278914, 0, 0, 68719476740}}
>         result = <optimized out>
> #2  0x00007f3f7c0f0ad2 in iscsi_reconnect
> (old_iscsi=old_iscsi@entry=0x7f3f81aec000) at lib/connect.c:242
>         iscsi = <optimized out>
>         retry = 0
> 
> If the bug happens as soon as the second VM is started, or very little
> after, it could be a configuration issue in the iSCSI target.  Is it
> possible to have access to the storage system?

I don't have that env - after one year - and when I tried similar flow on another env I did not face such issue anymore.