Bug 1426456 - Unable to live migrate virtual machine: Domain not found: no domain with matching name 'vm'
Summary: Unable to live migrate virtual machine: Domain not found: no domain with matc...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 26
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jiri Denemark
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: upgrade
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-24 00:07 UTC by Douglas Schilling Landgraf
Modified: 2019-04-28 11:07 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-04 19:41:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
logs-destination-server (2.11 MB, application/x-gzip)
2017-02-24 00:22 UTC, Douglas Schilling Landgraf
no flags Details
source logs (3.83 MB, application/x-gzip)
2017-02-24 00:24 UTC, Douglas Schilling Landgraf
no flags Details
logs.03012017-source.tar.gz (4.95 MB, application/x-gzip)
2017-03-01 20:27 UTC, Douglas Schilling Landgraf
no flags Details
logs.03012017-dest-server.tar.gz (3.61 MB, application/x-gzip)
2017-03-01 20:29 UTC, Douglas Schilling Landgraf
no flags Details
coredump-qemu.tar.gz (4.99 MB, application/x-gzip)
2017-03-02 19:57 UTC, Douglas Schilling Landgraf
no flags Details
rhevh6-6.8-20160707.3.xml (1.30 KB, application/x-gzip)
2017-03-02 20:06 UTC, Douglas Schilling Landgraf
no flags Details
testng50G.xml - destination server (1.33 KB, application/x-gzip)
2017-03-02 20:09 UTC, Douglas Schilling Landgraf
no flags Details
f25-coredump-files.tar.gz (5.05 MB, application/x-gzip)
2017-03-02 21:32 UTC, Douglas Schilling Landgraf
no flags Details

Description Douglas Schilling Landgraf 2017-02-24 00:07:16 UTC
Description of problem:

Unable to live migrate virtual machine from cluster EL6 to EL7.

Version-Release number of selected component (if applicable):

RHEVM: 3.6.10.2-0.2.el6
RHEV-H: rhev-hypervisor6-6.8-20160707.3.iso
RHVH: redhat-virtualization-host-3.6-20170216.0.x86_64.liveimg.squashfs

How reproducible:

1. Install rhev-hypervisor6-6.8-20160707.3.iso
   
2. In RHEVM: 
   - Set Default datacenter 3.5 compabilitiy mode
   - As rhev-hypervisor6-6.8-20160707.3 has rhevm network interface, 
     we require to create rhevm logical network, select:
          -> Datacenter 
             -> Logical Networks 
                -> Create
                   -> Use rhevm as the name for the network interface

3. Create a cluster for compatibility level 3.5 , example: cluster35
     -> Cluster 
        -> New  (Keep in mind to select the Management Network as *rhevm*) 

4. Register and Approve rhev-hypervisor6-6.8-20160707.3.iso from RHEV-M, into
   cluster35

5. Attach Data/ISO Storage (In this use-case NFS)

6. Create a virtual machine, in this example I have created a virtual based on the iso rhev-hypervisor7-7.3-20170118.0.iso

7. Create a new cluster with 3.5 compatibility for EL7, example cluster35_el7
     -> Cluster 
        -> New  (Keep in mind to select the Management Network as *rhevm*) 

8. Register and Approve redhat-virtualization-host-3.6-20170216.0.x86_64.liveimg.squashfs  into RHEV-M, cluster35_el7

9. Enable InClusterUpgrade option in RHEVM:
    # engine-config -s CheckMixedRhelVersions=false --cver=3.5
    # service ovirt-engine restart

     -> Click the Clusters tab.
        -> Select the cluster35_el7 and click Edit.
        -> Click the Scheduling Policy tab.
        -> From the Select Policy drop-down list, select InClusterUpgrade. 

10. Try to migrate the virtual machine:
     -> In the RHEVM Virtual Machine tab
        -> Migrate 
           -> Advanced Parameters
              -> Select cluster35_el7 -> OK

     The result of migration:
         Migration failed due to Error: Fatal error during migration 
         (VM: vm, Source: 192.168.122.53).

Comment 2 Douglas Schilling Landgraf 2017-02-24 00:21:04 UTC
In the destination server (redhat-virtualization-host-3.6-20170216.0), I have enabled libvirt debug:

# vi /etc/libvirt/libvirtd.conf
log_level = 1
log_filters="3:remote 4:event 3:json 3:rpc"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

- Restarted libvirtd
- After libvird, restarted vdsmd

Comment 3 Douglas Schilling Landgraf 2017-02-24 00:22:05 UTC
Created attachment 1257079 [details]
logs-destination-server

Comment 4 Douglas Schilling Landgraf 2017-02-24 00:24:27 UTC
Created attachment 1257081 [details]
source logs

Comment 5 Douglas Schilling Landgraf 2017-02-24 00:32:59 UTC
Just a note, if the migration UI show:

Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:
The host xxxxx did not satisfy internal filter InClusterUpgrade because its OS version is too old, found RHEL.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1425174#c11 for more details of temp. workaround.

Comment 6 Milan Zamazal 2017-02-24 08:36:52 UTC
I can see the following errors in destination libvirtd.log:

  2017-02-23 10:26:36.089+0000: 19683: error : qemuMonitorIO:695 : internal error: End of file from monitor
  2017-02-23 10:26:36.089+0000: 19683: debug : qemuDomainLogContextRead:3838 : Context read 0x7f6124016ec0 manager=(nil) inode=0 pos=111471
  2017-02-23 10:26:36.090+0000: 19683: error : qemuProcessReportLogError:1808 : internal error: qemu unexpectedly closed the monitor
  ...
  2017-02-23 10:26:36.302+0000: 20317: debug : processMonitorEOFEvent:4535 : Monitor connection to 'vm' closed without SHUTDOWN event; assuming the domain crashed

So it looks like QEMU crash on the destination shortly after starting the VM there.

Comment 7 Dan Kenigsberg 2017-02-25 18:42:07 UTC
qemu itself seems more like existing rather than crashing. DDAG, can you take a look?

Douglas, can you provide kernel, qemu, and libvirtd versions?

2017-02-23 10:26:35.617+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-01-05-04:19:14, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7_3.3), hostname: localhost.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=vm,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-vm/master-key.aes -machine rhel6.5.0,accel=kvm,usb=off -cpu SandyBridge -m 1024 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 1c77e5e2-e27d-42c1-9f50-48d0ba36b315 -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6.8-20160707.3.el6ev,serial=E9DA3028-F4BE-454B-96FC-928CEC3E728C,uuid=1c77e5e2-e27d-42c1-9f50-48d0ba36b315' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-vm/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-02-23T10:26:35,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-1-0,readonly=on,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/00000001-0001-0001-0001-00000000028d/f812d665-a96a-4fad-a0dc-8ead839622e1/images/ed442b86-ba28-4529-ada6-078efe047b1e/132f48f9-a52a-4f13-994a-92c713988de4,format=raw,if=none,id=drive-virtio-disk0,serial=ed442b86-ba28-4529-ada6-078efe047b1e,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/1c77e5e2-e27d-42c1-9f50-48d0ba36b315.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/1c77e5e2-e27d-42c1-9f50-48d0ba36b315.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=192.168.122.2,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
Domain id=1 is tainted: hook-script
2017-02-23 10:26:35.620+0000: 20301: debug : virFileClose:102 : Closed fd 28
2017-02-23 10:26:35.620+0000: 20301: debug : virFileClose:102 : Closed fd 34
2017-02-23 10:26:35.622+0000: 20301: debug : virFileClose:102 : Closed fd 3
2017-02-23 10:26:35.623+0000: 20302: debug : virExec:692 : Run hook 0x7f614a426560 0x7f615696b570
2017-02-23 10:26:35.623+0000: 20302: debug : qemuProcessHook:2648 : Obtaining domain lock
2017-02-23 10:26:35.623+0000: 20302: debug : virSecuritySELinuxSetSocketLabel:2267 : Setting VM vm socket context system_u:system_r:svirt_t:s0:c354,c722
2017-02-23 10:26:35.624+0000: 20302: debug : virDomainLockProcessStart:179 : plugin=0x7f6168861db0 dom=0x7f6124014130 paused=1 fd=0x7f615696b0a0
2017-02-23 10:26:35.624+0000: 20302: debug : virDomainLockManagerNew:134 : plugin=0x7f6168861db0 dom=0x7f6124014130 withResources=1
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerPluginGetDriver:281 : plugin=0x7f6168861db0
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerNew:305 : driver=0x7f612241f000 type=0 nparams=5 params=0x7f615696af50 flags=1
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerLogParams:98 :   key=uuid type=uuid value=1c77e5e2-e27d-42c1-9f50-48d0ba36b315
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerLogParams:91 :   key=name type=string value=vm
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerLogParams:79 :   key=id type=uint value=1
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerLogParams:79 :   key=pid type=uint value=20302
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerLogParams:94 :   key=uri type=cstring value=qemu:///system
2017-02-23 10:26:35.624+0000: 20302: debug : virDomainLockManagerNew:146 : Adding leases
2017-02-23 10:26:35.624+0000: 20302: debug : virDomainLockManagerNew:151 : Adding disks
2017-02-23 10:26:35.624+0000: 20302: debug : virDomainLockManagerAddImage:90 : Add disk /rhev/data-center/00000001-0001-0001-0001-00000000028d/f812d665-a96a-4fad-a0dc-8ead839622e1/images/ed442b86-ba28-4529-ada6-078efe047b1e/132f48f9-a52a-4f13-994a-92c713988de4
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerAddResource:332 : lock=0x7f6124017e20 type=0 name=/rhev/data-center/00000001-0001-0001-0001-00000000028d/f812d665-a96a-4fad-a0dc-8ead839622e1/images/ed442b86-ba28-4529-ada6-078efe047b1e/132f48f9-a52a-4f13-994a-92c713988de4 nparams=0 params=(nil) flags=0
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerAcquire:350 : lock=0x7f6124017e20 state='<null>' flags=3 action=0 fd=0x7f615696b0a0
2017-02-23 10:26:35.624+0000: 20302: debug : virLockManagerSanlockAcquire:933 : Register sanlock 3
2017-02-23 10:26:35.627+0000: 20302: debug : virLockManagerSanlockAcquire:1027 : Acquire completed fd=3
2017-02-23 10:26:35.627+0000: 20302: debug : virLockManagerFree:387 : lock=0x7f6124017e20
2017-02-23 10:26:35.627+0000: 20302: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7f61688c8920
2017-02-23 10:26:35.627+0000: 20302: debug : qemuProcessHook:2689 : Hook complete ret=0
2017-02-23 10:26:35.627+0000: 20302: debug : virExec:694 : Done hook 0
2017-02-23 10:26:35.627+0000: 20302: debug : virExec:701 : Setting child security label to system_u:system_r:svirt_t:s0:c354,c722
2017-02-23 10:26:35.627+0000: 20302: debug : virExec:731 : Setting child uid:gid to 107:107 with caps 0
2017-02-23 10:26:35.628+0000: 20302: debug : virCommandHandshakeChild:433 : Notifying parent for handshake start on 31
2017-02-23 10:26:35.628+0000: 20302: debug : virCommandHandshakeChild:441 : Waiting on parent for handshake complete on 32
2017-02-23 10:26:35.680+0000: 20302: debug : virFileClose:102 : Closed fd 31
2017-02-23 10:26:35.680+0000: 20302: debug : virFileClose:102 : Closed fd 32
2017-02-23 10:26:35.680+0000: 20302: debug : virCommandHandshakeChild:461 : Handshake with parent is done
2017-02-23 10:26:36.302+0000: shutting down

Comment 8 Dr. David Alan Gilbert 2017-02-27 09:42:35 UTC
There's not that much interesting in the logs from the QEMU side.
On the destination side we see a 'shutting down' at 10:26:36.302 - but there's no hint of there being any other error.
On the source side the last entry (is it the same time???) has a 'MIGRATE_CANCEL' - so that suggests something at a higher level decided to cancel the migration.

Comment 9 Milan Zamazal 2017-02-27 10:30:26 UTC
The shutting down event is probably result of libvirt error at 2017-02-23 10:26:36.090+0000, see my Comment 6 above. `destroy' gets called afterwards, see destination libvirt.log.
I couldn't see any higher level action that could cause that, but it's difficult to find out what belongs to what and where in the vast amount of logs. Douglas, could you please expose a single sample migration failure in the logs and check that clocks are properly synchronized on the hosts? And could you please enable libvirt debug logging also on the source host?

Comment 10 Dan Kenigsberg 2017-02-27 13:55:20 UTC
Israel, can you assist in reproducing this? Please first try el6->el7 on rhel-h.

Comment 11 Israel Pinto 2017-02-28 14:28:59 UTC
(In reply to Dan Kenigsberg from comment #10)
> Israel, can you assist in reproducing this? Please first try el6->el7 on
> rhel-h.

I check it with 3.6 engine 3.6.10.2-0.2.el6:
https://10.35.161.31/ovirt-engine/
Create 3.5 data center:
With 2 clusters:
- cluster 3.5 with one host:
 OS Version:RHEV Hypervisor - 6.8 - 20160707.3.el6ev
 Kernel Version:2.6.32 - 642.1.1.el6.x86_64
 KVM Version:0.12.1.2 - 2.491.el6_8.2
 LIBVIRT Version:libvirt-0.10.2-60.el6
 VDSM Version:vdsm-4.16.38-1.el6ev
- cluster 3.6 with one host:
 OS Version: RHEV Hypervisor - 7.3 - 20170118.0.el7ev
 Kernel Version: 3.10.0 - 514.6.1.el7.x86_64
 KVM Version:2.6.0 - 28.el7_3.3
 LIBVIRT Version:libvirt-2.0.0-10.el7_3.4
 VDSM Version:vdsm-4.17.37-1.el7ev
Create VM in 3.5 with 7.3 os and migrate it.
Migrate Pass.

Comment 12 Jiri Belka 2017-02-28 14:46:22 UTC
(In reply to Israel Pinto from comment #11)
> (In reply to Dan Kenigsberg from comment #10)
> > Israel, can you assist in reproducing this? Please first try el6->el7 on
> > rhel-h.
> 
> I check it with 3.6 engine 3.6.10.2-0.2.el6:
> https://10.35.161.31/ovirt-engine/
> Create 3.5 data center:
> With 2 clusters:
> - cluster 3.5 with one host:
>  OS Version:RHEV Hypervisor - 6.8 - 20160707.3.el6ev
>  Kernel Version:2.6.32 - 642.1.1.el6.x86_64
>  KVM Version:0.12.1.2 - 2.491.el6_8.2
>  LIBVIRT Version:libvirt-0.10.2-60.el6
>  VDSM Version:vdsm-4.16.38-1.el6ev
> - cluster 3.6 with one host:
>  OS Version: RHEV Hypervisor - 7.3 - 20170118.0.el7ev
>  Kernel Version: 3.10.0 - 514.6.1.el7.x86_64
>  KVM Version:2.6.0 - 28.el7_3.3
>  LIBVIRT Version:libvirt-2.0.0-10.el7_3.4
>  VDSM Version:vdsm-4.17.37-1.el7ev
> Create VM in 3.5 with 7.3 os and migrate it.
> Migrate Pass.

Original descrition mentions 2 clusters, both with 3.5 compat level. A typo in your testing?

Comment 13 Jiri Belka 2017-02-28 14:48:43 UTC
> 9. Enable InClusterUpgrade option in RHEVM:
>     # engine-config -s CheckMixedRhelVersions=false --cver=3.5
>     # service ovirt-engine restart
> 
>      -> Click the Clusters tab.
>         -> Select the cluster35_el7 and click Edit.
>         -> Click the Scheduling Policy tab.
>         -> From the Select Policy drop-down list, select InClusterUpgrade. 

I'm not sure if this is needed. The 'InClusterUpgrade' policy is per cluster but IIC you were migrating between two clusters, so you were not putting EL7 host in same cluster as EL6 host.

I don't say this is not valid scenario but 'InClusterUpgrade' was introduce just for possibility to keep "one" cluster.

Also, RHEL 6.7? Really?

Comment 14 Israel Pinto 2017-02-28 14:56:03 UTC
1. No error this is upgrade to 3.6 and the el7.3 should be in the 3.6 cluster 
2. what do you mean RHEL 6.7 ?

Comment 15 Israel Pinto 2017-02-28 15:29:41 UTC
Recheck this with 2 hosts in 35 cluster
with: 
1. 'InClusterUpgrade' 
2. Enable InClusterUpgrade option in RHEVM:
 # engine-config -s CheckMixedRhelVersions=false --cver=3.5
 # service ovirt-engine restart
Run vm on rhev-h 6.8 host and migrate it to rhev-h 7.3 
migrate pass.
Engine os is:  Red Hat Enterprise Linux Server release 6.9 (Santiago)


(In reply to Jiri Belka from comment #13)
> > 9. Enable InClusterUpgrade option in RHEVM:
> >     # engine-config -s CheckMixedRhelVersions=false --cver=3.5
> >     # service ovirt-engine restart
> > 
> >      -> Click the Clusters tab.
> >         -> Select the cluster35_el7 and click Edit.
> >         -> Click the Scheduling Policy tab.
> >         -> From the Select Policy drop-down list, select InClusterUpgrade. 
> 
> I'm not sure if this is needed. The 'InClusterUpgrade' policy is per cluster
> but IIC you were migrating between two clusters, so you were not putting EL7
> host in same cluster as EL6 host.
> 
> I don't say this is not valid scenario but 'InClusterUpgrade' was introduce
> just for possibility to keep "one" cluster.
> 
> Also, RHEL 6.7? Really?

Recheck this with 2 hosts in 35 cluster
with: 
1. 'InClusterUpgrade' 
2. Enable InClusterUpgrade option in RHEVM:
 # engine-config -s CheckMixedRhelVersions=false --cver=3.5
 # service ovirt-engine restart
Run vm on rhev-h 6.8 host and migrate it to rhev-h 7.3 
migrate pass.
Engine os is:  Red Hat Enterprise Linux Server release 6.9 (Santiago)

Comment 16 Douglas Schilling Landgraf 2017-03-01 20:24:19 UTC
I have simplified the environment:

1. One cluster (cluster3.5) with Enable InClusterUpgrade option in RHEVM
   
2. Not related, but, upgraded the RHEL of RHEVM to 6.9 to match Israel's

3. setenforce 0 in both hosts (didn't help)

4. make sure the fqdn of hosts are reachable.

5. make sure clock of hosts are synced

Source packages (Red Hat Enterprise Virtualization Hypervisor release 6.8 (20160707.3.el6ev):
==================
qemu-img-rhev-0.12.1.2-2.491.el6_8.2.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.491.el6_8.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.491.el6_8.2.x86_64
gpxe-roms-qemu-0.9.7-6.15.el6.noarch

qemu-kvm-rhev-tools-0.12.1.2-2.491.el6_8.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.491.el6_8.2.x86_64
dracut-kernel-004-409.el6_8.2.noarch

kernel-firmware-2.6.32-642.1.1.el6.noarch
kernel-2.6.32-642.1.1.el6.x86_64

vdsm-4.16.38-1.el6ev.x86_64
vdsm-python-zombiereaper-4.16.38-1.el6ev.noarch
vdsm-jsonrpc-4.16.38-1.el6ev.noarch
vdsm-python-4.16.38-1.el6ev.noarch
vdsm-cli-4.16.38-1.el6ev.noarch
vdsm-hook-vhostmd-4.16.38-1.el6ev.noarch
vdsm-hook-ethtool-options-4.16.38-1.el6ev.noarch
ovirt-node-plugin-vdsm-0.2.0-26.el6ev.noarch
vdsm-yajsonrpc-4.16.38-1.el6ev.noarch
vdsm-reg-4.16.38-1.el6ev.noarch
vdsm-xmlrpc-4.16.38-1.el6ev.noarch

libvirt-0.10.2-60.el6.x86_64
libvirt-cim-0.6.1-12.el6.x86_64
libvirt-client-0.10.2-60.el6.x86_64
libvirt-python-0.10.2-60.el6.x86_64
libvirt-lock-sanlock-0.10.2-60.el6.x86_64


Dest packages (Red Hat Virtualization Host 3.6 (el7.3)):
=====================
qemu-kvm-tools-rhev-2.6.0-28.el7_3.3.x86_64
qemu-kvm-common-rhev-2.6.0-28.el7_3.3.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.4.x86_64
qemu-img-rhev-2.6.0-28.el7_3.3.x86_64

kernel-3.10.0-514.6.1.el7.x86_64
kernel-tools-3.10.0-514.6.1.el7.x86_64
kernel-tools-libs-3.10.0-514.6.1.el7.x86_64

vdsm-cli-4.17.37-1.el7ev.noarch
vdsm-hook-ethtool-options-4.17.37-1.el7ev.noarch
vdsm-xmlrpc-4.17.37-1.el7ev.noarch
vdsm-jsonrpc-4.17.37-1.el7ev.noarch
vdsm-infra-4.17.37-1.el7ev.noarch
vdsm-4.17.37-1.el7ev.noarch
vdsm-hook-openstacknet-4.17.37-1.el7ev.noarch
vdsm-hook-vhostmd-4.17.37-1.el7ev.noarch
vdsm-yajsonrpc-4.17.37-1.el7ev.noarch
vdsm-python-4.17.37-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.17.37-1.el7ev.noarch
vdsm-hook-fcoe-4.17.37-1.el7ev.noarch

libvirt-python-2.0.0-2.el7.x86_64
libvirt-daemon-config-nwfilter-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-nodedev-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-storage-2.0.0-10.el7_3.4.x86_64
libvirt-client-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-nwfilter-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-interface-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-secret-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-network-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-kvm-2.0.0-10.el7_3.4.x86_64
libvirt-daemon-2.0.0-10.el7_3.4.x86_64
libvirt-lock-sanlock-2.0.0-10.el7_3.4.x86_64


More logs attached.

Comment 17 Douglas Schilling Landgraf 2017-03-01 20:27:52 UTC
Created attachment 1258858 [details]
logs.03012017-source.tar.gz

Comment 18 Douglas Schilling Landgraf 2017-03-01 20:29:57 UTC
Created attachment 1258859 [details]
logs.03012017-dest-server.tar.gz

Comment 19 Israel Pinto 2017-03-02 08:53:04 UTC
(In reply to Douglas Schilling Landgraf from comment #16)
> I have simplified the environment:
> 
> 1. One cluster (cluster3.5) with Enable InClusterUpgrade option in RHEVM
>    
> 2. Not related, but, upgraded the RHEL of RHEVM to 6.9 to match Israel's
> 
> 3. setenforce 0 in both hosts (didn't help)
> 
> 4. make sure the fqdn of hosts are reachable.
> 
> 5. make sure clock of hosts are synced
> 
> Source packages (Red Hat Enterprise Virtualization Hypervisor release 6.8
> (20160707.3.el6ev):
> ==================
> qemu-img-rhev-0.12.1.2-2.491.el6_8.2.x86_64
> qemu-kvm-rhev-tools-0.12.1.2-2.491.el6_8.2.x86_64
> qemu-kvm-rhev-0.12.1.2-2.491.el6_8.2.x86_64
> gpxe-roms-qemu-0.9.7-6.15.el6.noarch
> 
> qemu-kvm-rhev-tools-0.12.1.2-2.491.el6_8.2.x86_64
> qemu-kvm-rhev-0.12.1.2-2.491.el6_8.2.x86_64
> dracut-kernel-004-409.el6_8.2.noarch
> 
> kernel-firmware-2.6.32-642.1.1.el6.noarch
> kernel-2.6.32-642.1.1.el6.x86_64
> 
> vdsm-4.16.38-1.el6ev.x86_64
> vdsm-python-zombiereaper-4.16.38-1.el6ev.noarch
> vdsm-jsonrpc-4.16.38-1.el6ev.noarch
> vdsm-python-4.16.38-1.el6ev.noarch
> vdsm-cli-4.16.38-1.el6ev.noarch
> vdsm-hook-vhostmd-4.16.38-1.el6ev.noarch
> vdsm-hook-ethtool-options-4.16.38-1.el6ev.noarch
> ovirt-node-plugin-vdsm-0.2.0-26.el6ev.noarch
> vdsm-yajsonrpc-4.16.38-1.el6ev.noarch
> vdsm-reg-4.16.38-1.el6ev.noarch
> vdsm-xmlrpc-4.16.38-1.el6ev.noarch
> 
> libvirt-0.10.2-60.el6.x86_64
> libvirt-cim-0.6.1-12.el6.x86_64
> libvirt-client-0.10.2-60.el6.x86_64
> libvirt-python-0.10.2-60.el6.x86_64
> libvirt-lock-sanlock-0.10.2-60.el6.x86_64
> 
> 
> Dest packages (Red Hat Virtualization Host 3.6 (el7.3)):
> =====================
> qemu-kvm-tools-rhev-2.6.0-28.el7_3.3.x86_64
> qemu-kvm-common-rhev-2.6.0-28.el7_3.3.x86_64
> qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
> ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
> libvirt-daemon-driver-qemu-2.0.0-10.el7_3.4.x86_64
> qemu-img-rhev-2.6.0-28.el7_3.3.x86_64
> 
> kernel-3.10.0-514.6.1.el7.x86_64
> kernel-tools-3.10.0-514.6.1.el7.x86_64
> kernel-tools-libs-3.10.0-514.6.1.el7.x86_64
> 
> vdsm-cli-4.17.37-1.el7ev.noarch
> vdsm-hook-ethtool-options-4.17.37-1.el7ev.noarch
> vdsm-xmlrpc-4.17.37-1.el7ev.noarch
> vdsm-jsonrpc-4.17.37-1.el7ev.noarch
> vdsm-infra-4.17.37-1.el7ev.noarch
> vdsm-4.17.37-1.el7ev.noarch
> vdsm-hook-openstacknet-4.17.37-1.el7ev.noarch
> vdsm-hook-vhostmd-4.17.37-1.el7ev.noarch
> vdsm-yajsonrpc-4.17.37-1.el7ev.noarch
> vdsm-python-4.17.37-1.el7ev.noarch
> vdsm-hook-vmfex-dev-4.17.37-1.el7ev.noarch
> vdsm-hook-fcoe-4.17.37-1.el7ev.noarch
> 
> libvirt-python-2.0.0-2.el7.x86_64
> libvirt-daemon-config-nwfilter-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-nodedev-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-storage-2.0.0-10.el7_3.4.x86_64
> libvirt-client-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-nwfilter-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-interface-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-secret-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-network-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-driver-qemu-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-kvm-2.0.0-10.el7_3.4.x86_64
> libvirt-daemon-2.0.0-10.el7_3.4.x86_64
> libvirt-lock-sanlock-2.0.0-10.el7_3.4.x86_64
> 
> 
> More logs attached.

Can you check it again with 2 clusters since it the preferred option See: https://bugzilla.redhat.com/show_bug.cgi?id=1154631

Comment 20 Dr. David Alan Gilbert 2017-03-02 09:15:05 UTC
I don't see anything obvious in the log from QEMU; can you check dmesg and/or abrt to see if something captured a core or OOM?

Dave

Comment 21 Dr. David Alan Gilbert 2017-03-02 19:56:57 UTC
Ah the dmesg shows:
[ 2166.775317] traps: qemu-kvm[22350] trap invalid opcode ip:7f3d9feaa790 sp:7f3da50dfc00 error:0 in qemu-kvm[7f3d9fa02000+635000]

so qemu had crashed, and there's a bucket load of cores in /var/log/cores

observation, this is a nested setup;
the laptop is i7-4510U
The L1 is running as a Westmere
The L2 is trying to be a sandybridge

That's probably a bad combo; sandybridge is newer than Westmere.

Comment 22 Douglas Schilling Landgraf 2017-03-02 19:57:55 UTC
Created attachment 1259317 [details]
coredump-qemu.tar.gz

Comment 23 Douglas Schilling Landgraf 2017-03-02 20:06:30 UTC
Created attachment 1259319 [details]
rhevh6-6.8-20160707.3.xml

Comment 24 Dr. David Alan Gilbert 2017-03-02 20:08:41 UTC
L1:
[root@rhvh qemu]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Westmere E56xx/L56xx/X56xx (Nehalem-C)
stepping        : 1
microcode       : 0x1
cpu MHz         : 2593.992
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes avx f16c rdrand hypervisor lahf_lm abm arat tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid

Comment 25 Douglas Schilling Landgraf 2017-03-02 20:09:04 UTC
Created attachment 1259321 [details]
testng50G.xml  - destination server

Comment 26 Douglas Schilling Landgraf 2017-03-02 20:13:30 UTC
(In reply to Dr. David Alan Gilbert from comment #21)
> Ah the dmesg shows:
> [ 2166.775317] traps: qemu-kvm[22350] trap invalid opcode ip:7f3d9feaa790
> sp:7f3da50dfc00 error:0 in qemu-kvm[7f3d9fa02000+635000]
> 
> so qemu had crashed, and there's a bucket load of cores in /var/log/cores
> 
> observation, this is a nested setup;
> the laptop is i7-4510U
> The L1 is running as a Westmere
> The L2 is trying to be a sandybridge
> 
> That's probably a bad combo; sandybridge is newer than Westmere.

Just to share, tried Westmere as well, no migration possible.

Comment 27 Douglas Schilling Landgraf 2017-03-02 20:15:44 UTC
Host version of libvirt and virt-manager:

libvirt-daemon-config-network-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-network-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-xen-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-storage-1.3.3.2-1.fc24.x86_64
libvirt-python3-1.3.3-3.fc24.x86_64
libvirt-python-1.3.3-3.fc24.x86_64
libvirt-daemon-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-libxl-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-lxc-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-vbox-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-nwfilter-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-secret-1.3.3.2-1.fc24.x86_64
libvirt-client-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-interface-1.3.3.2-1.fc24.x86_64
libvirt-1.3.3.2-1.fc24.x86_64
libvirt-daemon-kvm-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-uml-1.3.3.2-1.fc24.x86_64
libvirt-glib-0.2.3-2.fc24.x86_64
libvirt-daemon-driver-nodedev-1.3.3.2-1.fc24.x86_64
libvirt-daemon-driver-qemu-1.3.3.2-1.fc24.x86_64
libvirt-daemon-config-nwfilter-1.3.3.2-1.fc24.x86_64

virt-manager-1.4.0-5.fc24.noarch
virt-manager-common-1.4.0-5.fc24.noarch

Comment 28 Dr. David Alan Gilbert 2017-03-02 20:16:48 UTC
The libvirt XML for the L1 has:

  <cpu mode='host-model'>
    <model fallback='allow'/>
  </cpu>

the host is actually:
L0:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 69
model name      : Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
stepping        : 1
microcode       : 0x1f
cpu MHz         : 2399.890
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
bugs            :
bogomips        : 5188.20
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

and the -cpu for the L1 is:
-cpu Westmere,+invpcid,+erms,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+abm,+rdtscp,+pdpe1gb,+rdrand,+f16c,+avx,+osxsave,+xsave,+tsc-deadline,+movbe,+pcid,+pdcm,+xtpr,+fma,+tm2,+est,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme

that's a crazy franken-cpu - Westmere's didn't have avx2;

(gdb) where
#0  0x00007f265a43e790 in buffer_find_nonzero_offset_avx2 ()
#1  0x00007f265a1cf235 in ram_handle_compressed ()
#2  0x00007f265a1cf518 in ram_load ()
#3  0x00007f265a1d0be6 in qemu_loadvm_state_main ()
#4  0x00007f265a1d2953 in qemu_loadvm_state ()
#5  0x00007f265a357a3b in process_incoming_migration_co ()
#6  0x00007f265a45028a in coroutine_trampoline ()
#7  0x00007f264f907cf0 in ?? () from /lib64/libc.so.6
#8  0x00007ffdcecdfac0 in ?? ()
#9  0x0000000000000000 in ?? ()


so I think the qemu ifunc code has freaked out at a Westmere with avx2; that sohuldn't happen but I have some sympathy.  Need to check with libvirt why it's given those flags.

Comment 29 Dr. David Alan Gilbert 2017-03-02 20:25:20 UTC
Hi Jiri,
  Any idea why libvirt is creating such a wierd -cpu there?
A Westmere with avx2 is definitely a weird one, I suspect it's because the libvirt is too old for the host (Haswell) or maybe because of a haswell/tsx issue?

Dave

Comment 30 Douglas Schilling Landgraf 2017-03-02 21:31:07 UTC
Upgrade the L0 (host) to F25. L1 and L2 are Westmere and still see 
the same issue.

# rpm -qa | grep -i libvirt
libvirt-daemon-driver-nwfilter-3.0.0-2.fc25.x86_64
libvirt-daemon-config-nwfilter-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-uml-3.0.0-2.fc25.x86_64
libvirt-glib-1.0.0-1.fc25.x86_64
libvirt-daemon-kvm-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-secret-3.0.0-2.fc25.x86_64
libvirt-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-qemu-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-libxl-3.0.0-2.fc25.x86_64
libvirt-python-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-xen-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-nodedev-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-network-3.0.0-2.fc25.x86_64
libvirt-client-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-interface-3.0.0-2.fc25.x86_64
libvirt-python3-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-storage-3.0.0-2.fc25.x86_64
libvirt-daemon-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-lxc-3.0.0-2.fc25.x86_64
libvirt-libs-3.0.0-2.fc25.x86_64
libvirt-daemon-config-network-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-vbox-3.0.0-2.fc25.x86_64


# rpm -qa | grep -i virt-manager
virt-manager-1.4.0-6.fc25.noarch
virt-manager-common-1.4.0-6.fc25.noarch

Comment 31 Douglas Schilling Landgraf 2017-03-02 21:32:45 UTC
Created attachment 1259341 [details]
f25-coredump-files.tar.gz

Comment 32 Jiri Denemark 2017-03-02 22:12:33 UTC
(In reply to Dr. David Alan Gilbert from comment #28)
> The libvirt XML for the L1 has:
> 
>   <cpu mode='host-model'>

This is the reason for such a strange -cpu command line. The host-model mode started to be better with libvirt-2.3.0, but libvirt-3.2.0 and qemu-2.9.0 are needed to make host-model really usable. In general I'd suggest using host-passthrough for the L1 domains.

However, it seems the same strange CPU model is chosen even by libvirt 3.0.0, which would suggest we did not correctly detect the host CPU. Could anyone with access to the host run http://libvirt.org/git/?p=libvirt.git;a=blob_plain;f=tests/cputestdata/cpu-gather.sh;hb=HEAD script (after installing the cpuid tool) and attach the result here?

Comment 33 Michal Skrivanek 2017-03-03 08:15:59 UTC
(In reply to Dr. David Alan Gilbert from comment #21)
> observation, this is a nested setup;

Douglas, please always note whether it is a nested setup or not. It is _not_ supported in production.
I'll keep the bug open in oVirt to see what can we do with dependencies and/or the CPU mode

Comment 34 Dr. David Alan Gilbert 2017-03-03 09:32:01 UTC
Can you try the test Jiri has asked for in comment 32.

Comment 35 Dr. David Alan Gilbert 2017-03-03 10:07:39 UTC
The core when run under the f25 host looks the same; failure in buffer_find_nonzero_offset_avx2 (disassembly shows vpxor  %xmm1,%xmm1,%xmm1 )

Comment 36 Dr. David Alan Gilbert 2017-03-03 10:24:03 UTC
The cause of the actual exception is qemu trying to use avx2 acceleration during the migration;  the test whether to use avx2 was improved in qemu 2.8.0 (by upstream 5e33a872).  So you should get that sometime in the future - although if desperate you could ask for a Z - but I don't think it would hit the problem on any real host (or with a less odd -cpu flag set).

Comment 37 Douglas Schilling Landgraf 2017-03-03 14:29:51 UTC
$ sh tests_cputestdata_cpu-gather.sh 
model name	: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
CPU:
   0x00000000 0x00: eax=0x0000000d ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x00040651 ebx=0x01100800 ecx=0x7fdafbbf edx=0xbfebfbff
   0x00000002 0x00: eax=0x76036301 ebx=0x00f0b5ff ecx=0x00000000 edx=0x00c10000
   0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x01: eax=0x1c004122 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x02: eax=0x1c004143 ebx=0x01c0003f ecx=0x000001ff edx=0x00000000
   0x00000004 0x03: eax=0x1c03c163 ebx=0x03c0003f ecx=0x00000fff edx=0x00000006
   0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x11142120
   0x00000006 0x00: eax=0x00000077 ebx=0x00000002 ecx=0x00000009 edx=0x00000000
   0x00000007 0x00: eax=0x00000000 ebx=0x000027ab ecx=0x00000000 edx=0x00000000
   0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000a 0x00: eax=0x07300403 ebx=0x00000000 ecx=0x00000000 edx=0x00000603
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000001
   0x0000000b 0x01: eax=0x00000004 ebx=0x00000004 ecx=0x00000201 edx=0x00000001
   0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000
   0x0000000d 0x01: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 edx=0x00000000
   0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000021 edx=0x2c100800
   0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x726f4320 edx=0x4d542865
   0x80000003 0x00: eax=0x37692029 ebx=0x3135342d ecx=0x43205530 edx=0x40205550
   0x80000004 0x00: eax=0x302e3220 ebx=0x7a484730 ecx=0x00000000 edx=0x00000000
   0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x01006040 edx=0x00000000
   0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
   0x80000008 0x00: eax=0x00003027 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80860000 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000
   0xc0000000 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000

{"QMP": {"version": {"qemu": {"micro": 0, "minor": 8, "major": 2}, "package": "(qemu-2.8.0-2.fc25)"}, "capabilities": []}}
{"return": {}}
{"return": [{"cpuid-register": "EDX", "cpuid-input-ecx": 0, "cpuid-input-eax": 13, "features": 0}, {"cpuid-register": "EAX", "cpuid-input-ecx": 0, "cpuid-input-eax": 13, "features": 7}, {"cpuid-register": "EAX", "cpuid-input-eax": 6, "features": 4}, {"cpuid-register": "EAX", "cpuid-input-ecx": 1, "cpuid-input-eax": 13, "features": 1}, {"cpuid-register": "EDX", "cpuid-input-eax": 2147483658, "features": 0}, {"cpuid-register": "EDX", "cpuid-input-eax": 1073741827, "features": 0}, {"cpuid-register": "EBX", "cpuid-input-eax": 1073741827, "features": 0}, {"cpuid-register": "EAX", "cpuid-input-eax": 1073741827, "features": 0}, {"cpuid-register": "EAX", "cpuid-input-eax": 1073741825, "features": 16777467}, {"cpuid-register": "EDX", "cpuid-input-eax": 3221225473, "features": 0}, {"cpuid-register": "EDX", "cpuid-input-eax": 2147483655, "features": 0}, {"cpuid-register": "ECX", "cpuid-input-eax": 2147483649, "features": 33}, {"cpuid-register": "EDX", "cpuid-input-eax": 2147483649, "features": 739248128}, {"cpuid-register": "EDX", "cpuid-input-ecx": 0, "cpuid-input-eax": 7, "features": 0}, {"cpuid-register": "ECX", "cpuid-input-ecx": 0, "cpuid-input-eax": 7, "features": 0}, {"cpuid-register": "EBX", "cpuid-input-ecx": 0, "cpuid-input-eax": 7, "features": 1963}, {"cpuid-register": "ECX", "cpuid-input-eax": 1, "features": 4160369187}, {"cpuid-register": "EDX", "cpuid-input-eax": 1, "features": 260832255}], "id": "feature-words"}
{"return": 6, "id": "family"}
{"return": 69, "id": "model"}
{"return": 1, "id": "stepping"}
{"return": "Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz", "id": "model-id"}
{"return": {}}
{"timestamp": {"seconds": 1488551325, "microseconds": 706720}, "event": "SHUTDOWN"}

Comment 38 Jiri Denemark 2017-03-03 16:03:17 UTC
What version of libvirt is running on the host again? There are several versions mentioned in this BZ and I just want to make it clear.

And what does "virsh domcapabilities --virttype kvm" report in /domainCapabilities/cpu/mode[@name='host-model']? You can use the following command to check it:

    virsh domcapabilities --virttype kvm | xmllint -xpath "/domainCapabilities/cpu/mode[@name='host-model']" -

Comment 39 Douglas Schilling Landgraf 2017-03-03 17:53:04 UTC
(In reply to Jiri Denemark from comment #38)
> What version of libvirt is running on the host again? There are several
> versions mentioned in this BZ and I just want to make it clear.

$ cat /etc/redhat-release 
Fedora release 25 (Twenty Five)

$ rpm -qa | grep -i libvirt
libvirt-daemon-driver-nwfilter-3.0.0-2.fc25.x86_64
libvirt-daemon-config-nwfilter-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-uml-3.0.0-2.fc25.x86_64
libvirt-glib-1.0.0-1.fc25.x86_64
libvirt-daemon-kvm-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-secret-3.0.0-2.fc25.x86_64
libvirt-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-qemu-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-libxl-3.0.0-2.fc25.x86_64
libvirt-python-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-xen-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-nodedev-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-network-3.0.0-2.fc25.x86_64
libvirt-client-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-interface-3.0.0-2.fc25.x86_64
libvirt-python3-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-storage-3.0.0-2.fc25.x86_64
libvirt-daemon-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-lxc-3.0.0-2.fc25.x86_64
libvirt-libs-3.0.0-2.fc25.x86_64
libvirt-daemon-config-network-3.0.0-2.fc25.x86_64
libvirt-daemon-driver-vbox-3.0.0-2.fc25.x86_64


> 
> And what does "virsh domcapabilities --virttype kvm" report in
> /domainCapabilities/cpu/mode[@name='host-model']? You can use the following
> command to check it:
> 
>     virsh domcapabilities --virttype kvm | xmllint -xpath
> "/domainCapabilities/cpu/mode[@name='host-model']" -


<mode name="host-model" supported="yes">
      <model fallback="allow">Westmere</model>
      <vendor>Intel</vendor>
      <feature policy="require" name="vme"/>
      <feature policy="require" name="ds"/>
      <feature policy="require" name="acpi"/>
      <feature policy="require" name="ss"/>
      <feature policy="require" name="ht"/>
      <feature policy="require" name="tm"/>
      <feature policy="require" name="pbe"/>
      <feature policy="require" name="pclmuldq"/>
      <feature policy="require" name="dtes64"/>
      <feature policy="require" name="monitor"/>
      <feature policy="require" name="ds_cpl"/>
      <feature policy="require" name="vmx"/>
      <feature policy="require" name="est"/>
      <feature policy="require" name="tm2"/>
      <feature policy="require" name="fma"/>
      <feature policy="require" name="xtpr"/>
      <feature policy="require" name="pdcm"/>
      <feature policy="require" name="pcid"/>
      <feature policy="require" name="movbe"/>
      <feature policy="require" name="tsc-deadline"/>
      <feature policy="require" name="xsave"/>
      <feature policy="require" name="osxsave"/>
      <feature policy="require" name="avx"/>
      <feature policy="require" name="f16c"/>
      <feature policy="require" name="rdrand"/>
      <feature policy="require" name="arat"/>
      <feature policy="require" name="fsgsbase"/>
      <feature policy="require" name="tsc_adjust"/>
      <feature policy="require" name="bmi1"/>
      <feature policy="require" name="avx2"/>
      <feature policy="require" name="smep"/>
      <feature policy="require" name="bmi2"/>
      <feature policy="require" name="erms"/>
      <feature policy="require" name="invpcid"/>
      <feature policy="require" name="xsaveopt"/>
      <feature policy="require" name="pdpe1gb"/>
      <feature policy="require" name="rdtscp"/>
      <feature policy="require" name="abm"/>
      <feature policy="require" name="invtsc"/>
    </mode>

Comment 40 Jiri Denemark 2017-03-03 20:43:49 UTC
Oh, this is very interesting. With libvirt 3.1.0 the host CPU is detected as

  <model>Haswell-noTSX</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vme'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='f16c'/>
  <feature policy='require' name='rdrand'/>
  <feature policy='require' name='hypervisor'/>
  <feature policy='require' name='arat'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='xsaveopt'/>
  <feature policy='require' name='pdpe1gb'/>
  <feature policy='require' name='abm'/>

I made changes to the CPU code for 3.1.0, but I don't remember any change which would cause this. The CPU was supposed to be detected as Haswell since about 2.3.0. I'll look at this more on Monday.

Comment 41 Jiri Denemark 2017-03-03 20:50:49 UTC
I pasted a wrong XML, it should be (still Haswell, though):

  <model>Haswell-noTSX</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vme'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='osxsave'/>
  <feature policy='require' name='f16c'/>
  <feature policy='require' name='rdrand'/>
  <feature policy='require' name='arat'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='xsaveopt'/>
  <feature policy='require' name='pdpe1gb'/>
  <feature policy='require' name='abm'/>
  <feature policy='require' name='invtsc'/>
  <feature policy='disable' name='x2apic'/>

The XML from comment #40 would require libvirt 3.2.0 and QEMU 2.9.0.

Comment 42 Jiri Denemark 2017-03-06 10:05:02 UTC
Oops. The code which reports a host CPU model in domain capabilities XML is in libvirt since 2.3.0, but I failed to actually make use of it and provide a more accurate CPU model there (host capabilities cannot disable features and thus a different model has to be used there). I'll work on it for 3.2.0 so that cpu-model is better even if QEMU is not new enough.

Comment 43 Dr. David Alan Gilbert 2017-03-06 10:12:22 UTC
Hi Douglas,
  My understanding is that you can work around this by going to virt-manager of your top level VM and clicking on the CPUs tab, and changing the Model: to host-passthrough  (you'll need to type that into the Model: box - it's not on the menu).

Dave

Comment 44 Jiri Denemark 2017-03-08 13:49:20 UTC
BTW, I sent patches which should fix this issue to libvirt-devel list: https://www.redhat.com/archives/libvir-list/2017-March/msg00322.html

Comment 45 Douglas Schilling Landgraf 2017-03-31 17:32:37 UTC
Thanks guys, updating the bug report as we have a patch.

Comment 46 Cole Robinson 2017-05-04 19:41:42 UTC
I don't think this stuff is feasible to backport, so just closing against F26 since these patches are already there as part of libvirt 3.2.0


Note You need to log in before you can comment on or make changes to this bug.