Bug 1463957

Summary: [UPDATES] Target CPU mode custom does not match source host-model
Product: Red Hat Enterprise Linux 7 Reporter: Yurii Prokulevych <yprokule>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Luyao Huang <lhuang>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: berrange, chhu, dasmith, dyuan, eglynn, ehabkost, fjin, jdenemar, jishao, jreznik, jsuchane, kchamart, knoel, lhuang, lyarwood, mcornea, mtessun, rbalakri, rbryant, salmy, sasha, sbauza, sferdjao, sgordon, srevivo, vromanso, xuzhang, yalzhang, yprokule
Target Milestone: rcKeywords: ZStream
Target Release: 7.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.7.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1470582 (view as bug list) Environment:
Last Closed: 2018-04-10 10:50:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1470582    

Description Yurii Prokulevych 2017-06-22 07:00:47 UTC
Description of problem:
-----------------------
After minor update of RHOS-11 to RHEL-7.4 live migration of instance failed.

nova live-migration acae897c-5a0c-4226-9378-f1574ce61c5e compute-1.localdomain
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 10.0.0.101 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 10.0.0.101 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/novaclient/client.py:278: UserWarning: The 'tenant_id' argument is deprecated in Ocata and its use may result in errors in future releases. As 'project_id' is provided, the 'tenant_id' argument will be ignored.
  warnings.warn(msg)
ERROR (ClientException): Unknown Error (HTTP 504)

Next traceback present in journal:
----------------------------------
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: Traceback (most recent call last):
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: timer()
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: cb(*args, **kw)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 168, in _do_send
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: waiter.switch(result)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: result = function(*args, **kwargs)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/nova/utils.py", line 1087, in context_wrapper
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: return func(*args, **kwargs)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6163, in _live_migration_operation
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: instance=instance)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: self.force_reraise()
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: six.reraise(self.type_, self.value, self.tb)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6156, in _live_migration_operation
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: bandwidth=CONF.libvirt.live_migration_bandwidth)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 623, in migrate
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: destination, params=params, flags=flags)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: result = proxy_call(self._autowrap, f, *args, **kwargs)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: rv = execute(f, *args, **kwargs)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: six.reraise(c, e, tb)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: rv = meth(*args, **kwargs)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1679, in migrateToURI3
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: libvirtError: unsupported configuration: Target CPU mode custom does not match source host-model


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-nova-novncproxy-15.0.3-3.el7ost.noarch
puppet-nova-10.4.0-5.el7ost.noarch
openstack-nova-console-15.0.3-3.el7ost.noarch
python-novaclient-7.1.0-1.el7ost.noarch
openstack-nova-common-15.0.3-3.el7ost.noarch
openstack-nova-conductor-15.0.3-3.el7ost.noarch
openstack-nova-placement-api-15.0.3-3.el7ost.noarch
openstack-nova-migration-15.0.3-3.el7ost.noarch
openstack-nova-compute-15.0.3-3.el7ost.noarch
openstack-nova-scheduler-15.0.3-3.el7ost.noarch
python-nova-15.0.3-3.el7ost.noarch
openstack-nova-cert-15.0.3-3.el7ost.noarch
openstack-nova-api-15.0.3-3.el7ost.noarch

libvirt-daemon-driver-storage-disk-3.2.0-10.el7.x86_64
libvirt-daemon-driver-qemu-3.2.0-10.el7.x86_64
libvirt-daemon-driver-interface-3.2.0-10.el7.x86_64
libvirt-daemon-driver-lxc-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-3.2.0-10.el7.x86_64
libvirt-daemon-3.2.0-10.el7.x86_64
libvirt-daemon-config-nwfilter-3.2.0-10.el7.x86_64
libvirt-client-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-core-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-10.el7.x86_64
libvirt-daemon-driver-secret-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-10.el7.x86_64
libvirt-daemon-config-network-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-10.el7.x86_64
libvirt-libs-3.2.0-10.el7.x86_64
libvirt-daemon-driver-nodedev-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-10.el7.x86_64
libvirt-3.2.0-10.el7.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-10.el7.x86_64
libvirt-daemon-kvm-3.2.0-10.el7.x86_64
libvirt-python-3.2.0-3.el7.x86_64
libvirt-daemon-driver-network-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-10.el7.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-10.el7.x86_64

Steps to Reproduce:
-------------------
1. Deploy RHOS-11 ga
2. Setup latest repos on uc/oc
3. Update uc/oc
4. Setup repos 7.4-testing
5. Update uc/oc
6. Try live migrate instance

Actual results:
----------------
Instance stucks in MIGRATING status

Additional info:
----------------
Virtual setup: 3controllers + 3compute + 3ceph
All computes except hosting vm were rebooted.

Comment 2 Lee Yarwood 2017-06-23 13:16:12 UTC
(In reply to Yurii Prokulevych from comment #0)
> Next traceback present in journal:
> ----------------------------------
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: Traceback (most
> recent call last):
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in
> fire_timers
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: timer()
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in
> __call__
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: cb(*args, **kw)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/event.py", line 168, in _do_send
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]:
> waiter.switch(result)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: result =
> function(*args, **kwargs)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/nova/utils.py", line 1087, in
> context_wrapper
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: return
> func(*args, **kwargs)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6163,
> in _live_migration_operation
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]:
> instance=instance)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in
> __exit__
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]:
> self.force_reraise()
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in
> force_reraise
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]:
> six.reraise(self.type_, self.value, self.tb)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6156,
> in _live_migration_operation
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]:
> bandwidth=CONF.libvirt.live_migration_bandwidth)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 623, in
> migrate
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: destination,
> params=params, flags=flags)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: result =
> proxy_call(self._autowrap, f, *args, **kwargs)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: rv = execute(f,
> *args, **kwargs)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: six.reraise(c,
> e, tb)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: rv = meth(*args,
> **kwargs)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: File
> "/usr/lib64/python2.7/site-packages/libvirt.py", line 1679, in migrateToURI3
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: if ret == -1:
> raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
> Jun 22 06:38:26 compute-2.localdomain nova-compute[149945]: libvirtError:
> unsupported configuration: Target CPU mode custom does not match source
> host-model

So the provided logs don't contain any reference to this event :

# tail -n1 sosreport-live-migration-compute-*/var/log/nova/nova-compute.log
==> sosreport-live-migration-compute-0-20170621165400/var/log/nova/nova-compute.log <==
2017-06-21 16:54:01.240 2711 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: fed3b73e83e14f3cb40fe089d7682fbc __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:346

==> sosreport-live-migration-compute-1-20170621170316/var/log/nova/nova-compute.log <==
2017-06-21 17:03:16.195 15136 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: cafc40d129a948a3a5d1922d78fc3176 __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:346

==> sosreport-live-migration-compute-2-20170621171233/var/log/nova/nova-compute.log <==
2017-06-21 17:12:34.043 149945 DEBUG nova.compute.manager [req-dac7d3f5-5a0b-48a3-80c7-0de13d44f154 - - - - -] [instance: acae897c-5a0c-4226-9378-f1574ce61c5e] Updated the network info_cache for instance _heal_instance_info_cache /usr/lib/python2.7/site-packages/nova/compute/manager.py:5913

Can we get fresh nova-compute.log files (src and dest), libvirtd.log _and_ the output of `virsh dumpxml $instance_uuid` from the source host attached to this bug please?

[ Notes ]

There are some examples in the logs however with a different trace to before :

# grep 'Live Migration failure: unsupported configuration: Target CPU mode custom does not match source host-model' sosreport-live-migration-compute-*/var/log/nova/nova-compute.log
sosreport-live-migration-compute-2-20170621171233/var/log/nova/nova-compute.log:2017-06-21 14:53:35.522 147615 ERROR nova.virt.libvirt.driver [req-cc95a5bb-d9cf-4ce7-9195-6d8d9e98b3104299dcf3c81848178901150112515839 d4d4c8b412b2465d9fbc09aa64df1a21 - - -] [instance: acae897c-5a0c-4226-9378-f1574ce61c5e] Live Migration failure: unsupported configuration: Target CPU mode custom does not match source host-model
sosreport-live-migration-compute-2-20170621171233/var/log/nova/nova-compute.log:2017-06-21 15:24:22.558 147615 ERROR nova.virt.libvirt.driver [req-ce28916c-eb6e-437f-a95f-05680dc942dc4299dcf3c81848178901150112515839 d4d4c8b412b2465d9fbc09aa64df1a21 - - -] [instance: acae897c-5a0c-4226-9378-f1574ce61c5e] Live Migration failure: unsupported configuration: Target CPU mode custom does not match source host-model
sosreport-live-migration-compute-2-20170621171233/var/log/nova/nova-compute.log:2017-06-21 15:43:09.665 149945 ERROR nova.virt.libvirt.driver [req-23e1b27f-3a0e-4bbb-bb09-e535b20df29e4299dcf3c81848178901150112515839 d4d4c8b412b2465d9fbc09aa64df1a21 - - -] [instance: acae897c-5a0c-4226-9378-f1574ce61c5e] Live Migration failure: unsupported configuration: Target CPU mode custom does not match source host-model

These suggest the dest domain is trying to use the custom cpu mode while the src domain used host-model, as configured within Nova :

# grep ^cpu_mode sosreport-live-migration-compute-*/etc/nova/nova.conf 
sosreport-live-migration-compute-0-20170621165400/etc/nova/nova.conf:cpu_mode=host-model
sosreport-live-migration-compute-1-20170621170316/etc/nova/nova.conf:cpu_mode=host-model
sosreport-live-migration-compute-2-20170621171233/etc/nova/nova.conf:cpu_mode=host-model

The following _compare_cpu check on the dest appears wrong, not listing a mode and thus defaulting to custom :

sosreport-live-migration-compute-1-20170621170316/var/log/nova/nova-compute.log

27916 2017-06-21 14:53:28.735 2367 INFO nova.virt.libvirt.driver [req-cc95a5bb-d9cf-4ce7-9195-6d8d9e98b310 4299dcf3c81848178901150112515839 d4d4c8b412b2465d9fbc09aa64df1a21 - - -] Instance launched has CPU info: {"vendor": "Intel", "model": "Broadwell", "arch": "x86_64", "features": ["pge", "avx", "xsaveopt", "clflush", "sep", "rtm", "tsc_adjust", "tsc-deadline", "invpcid", "tsc", "fsgsbase", "xsave", "smap", "vmx", "erms", "hle", "cmov", "smep", "fpu", "pat", "arat      ", "lm", "msr", "adx", "3dnowprefetch", "nx", "fxsr", "syscall", "sse4.1", "pae", "sse4.2", "pclmuldq", "pcid", "fma", "vme", "mmx", "osxsave", "cx8", "mce", "de", "aes", "mca", "pse", "lahf_lm", "abm", "rdseed", "popcnt", "pdpe1gb", "apic", "sse", "f16c", "pni", "rdtscp", "avx2", "sse2", "ss", "hypervisor", "bmi1", "bmi2", "ssse3", "cx16", "pse36", "mtrr", "movbe", "rdrand", "x2apic"], "topology": {"cores": 1, "cells": 1, "threads": 1, "sockets": 2}}
27917 2017-06-21 14:53:28.736 2367 DEBUG nova.virt.libvirt.driver [req-cc95a5bb-d9cf-4ce7-9195-6d8d9e98b310 4299dcf3c81848178901150112515839 d4d4c8b412b2465d9fbc09aa64df1a21 - - -] [instance: acae897c-5a0c-4226-9378-f1574ce61c5e] cpu compare xml: <cpu>
27918   <arch>x86_64</arch>                                                           
27919   <model>Broadwell</model>                                                      
27920   <vendor>Intel</vendor>                                                        
27921   <topology sockets="2" cores="1" threads="1"/>                                 
27922   <feature name="3dnowprefetch"/>                                               
27923   <feature name="abm"/>                                                         
27924   <feature name="adx"/>                                                         
27925   <feature name="aes"/>                                                         
27926   <feature name="apic"/>                                                        
27927   <feature name="arat"/>                                                        
27928   <feature name="avx"/>                                                         
27929   <feature name="avx2"/>                                                        
27930   <feature name="bmi1"/>                                                        
27931   <feature name="bmi2"/>                                                        
27932   <feature name="clflush"/>                                                     
27933   <feature name="cmov"/>                                                        
27934   <feature name="cx16"/>                                                        
27935   <feature name="cx8"/>                                                         
27936   <feature name="de"/>                                                          
27937   <feature name="erms"/>                                                        
27938   <feature name="f16c"/>                                                        
27939   <feature name="fma"/>                                                         
27940   <feature name="fpu"/>                                                         
27941   <feature name="fsgsbase"/>                                                    
27942   <feature name="fxsr"/>                                                        
27943   <feature name="hle"/>                                                         
27944   <feature name="hypervisor"/>                                                  
27945   <feature name="invpcid"/>                                                     
27946   <feature name="lahf_lm"/>                                                     
27947   <feature name="lm"/>                                                          
27948   <feature name="mca"/>                                                         
27949   <feature name="mce"/>                                                         
27950   <feature name="mmx"/>                                                         
27951   <feature name="movbe"/>                                                       
27952   <feature name="msr"/>                                                         
27953   <feature name="mtrr"/>                                                        
27954   <feature name="nx"/>                                                          
27955   <feature name="osxsave"/>                                                     
27956   <feature name="pae"/>                                                         
27957   <feature name="pat"/>                                                         
27958   <feature name="pcid"/>                                                        
27959   <feature name="pclmuldq"/>                                                    
27960   <feature name="pdpe1gb"/>                                                     
27961   <feature name="pge"/>                                                         
27962   <feature name="pni"/>                                                         
27963   <feature name="popcnt"/>                                                      
27964   <feature name="pse"/>                                                         
27965   <feature name="pse36"/>                                                       
27966   <feature name="rdrand"/>                                                      
27967   <feature name="rdseed"/>                                                      
27968   <feature name="rdtscp"/>                                                      
27969   <feature name="rtm"/>                                                         
27970   <feature name="sep"/>                                                         
27971   <feature name="smap"/>                                                        
27972   <feature name="smep"/>                                                        
27973   <feature name="ss"/>                                                          
27974   <feature name="sse"/>                                                         
27975   <feature name="sse2"/>                                                        
27976   <feature name="sse4.1"/>                                                      
27977   <feature name="sse4.2"/>                                                      
27978   <feature name="ssse3"/>                                                       
27979   <feature name="syscall"/>                                                     
27980   <feature name="tsc"/>                                                         
27981   <feature name="tsc-deadline"/>                                                
27982   <feature name="tsc_adjust"/>                                                  
27983   <feature name="vme"/>                                                         
27984   <feature name="vmx"/>                                                         
27985   <feature name="x2apic"/>                                                      
27986   <feature name="xsave"/>                                                       
27987   <feature name="xsaveopt"/>                                                    
27988 </cpu>                                                                          
27989  _compare_cpu /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:5920

Comment 3 Lee Yarwood 2017-06-23 13:52:02 UTC
(In reply to Lee Yarwood from comment #2)
> Can we get fresh nova-compute.log files (src and dest), libvirtd.log _and_
> the output of `virsh dumpxml $instance_uuid` from the source host attached
> to this bug please?

Can we also get `virsh dumpxml $instance_uuid --migratable` output from the source, there have been previous bugs with this in libvirt where certain items were dropped from the resulting XML.

Comment 4 Lee Yarwood 2017-06-23 14:11:50 UTC
Quickly trying to reproduce this in an upgraded OSP 11 7.3 to 7.4 allinone env I see that all instances are being spun up with mode=custom even if cpu_mode=host-model is listed in the XML generated by Nova :

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.4 Beta (Maipo)

# grep ^cpu_mode /etc/nova/nova.conf
cpu_mode=host-model

# nova boot --image cirros --flavor 1 test
[..]
# grep \<cpu\ mode /var/log/nova/nova-compute.log 
  <cpu mode="host-model" match="exact">
# virsh dumpxml 5ec8a8d4-f87c-4989-82b5-aabc2d8e5aa6 | grep \<cpu\ mode
  <cpu mode='custom' match='exact' check='full'>
# virsh dumpxml 5ec8a8d4-f87c-4989-82b5-aabc2d8e5aa6 --migratable | grep \<cpu\ mode
  <cpu mode='custom' match='exact' check='partial'>

Comment 5 Lee Yarwood 2017-06-26 12:14:31 UTC
Still waiting on logs from the reporter but I believe we are good to move this over to RHEL and libvirt as Nova doesn't appear to be at fault here...

Comment 6 Yurii Prokulevych 2017-06-26 12:21:18 UTC
Lee, system got re-provisioned so I don't have logs handy. I'll update bz as soon as I got this reproduced.

Comment 7 Jiri Denemark 2017-06-26 12:38:23 UTC
(In reply to Lee Yarwood from comment #4)
> Quickly trying to reproduce this in an upgraded OSP 11 7.3 to 7.4 allinone
> env I see that all instances are being spun up with mode=custom even if
> cpu_mode=host-model is listed in the XML generated by Nova :

That's correct, libvirt translates host-model CPUs into custom ones to make sure the CPU does not change after migration. Let me guess, is Nova passing a custom XML to the migration API? If so, is it using a (modified) XML it got from libvirt by calling virDomainGetXMLDesc with VIR_DOMAIN_XML_MIGRATABLE flag?

Comment 8 Lee Yarwood 2017-06-26 13:11:25 UTC
(In reply to Jiri Denemark from comment #7)
> (In reply to Lee Yarwood from comment #4)
> > Quickly trying to reproduce this in an upgraded OSP 11 7.3 to 7.4 allinone
> > env I see that all instances are being spun up with mode=custom even if
> > cpu_mode=host-model is listed in the XML generated by Nova :
> 
> That's correct, libvirt translates host-model CPUs into custom ones to make
> sure the CPU does not change after migration. Let me guess, is Nova passing
> a custom XML to the migration API? If so, is it using a (modified) XML it
> got from libvirt by calling virDomainGetXMLDesc with
> VIR_DOMAIN_XML_MIGRATABLE flag?

Yes we are providing the destination domain's XML during the migration, using the XML fetched from virDomainGetXMLDesc on the source with the VIR_DOMAIN_XML_MIGRATABLE flag set. This still returns the custom mode as seen in c#4. If that's also expected I guess Nova needs to update this XML to use host-model right?

Comment 9 Jiri Denemark 2017-06-26 14:09:26 UTC
No, Nova is not supposed to change the XML except for the parts which are visible only to a host (such as disk paths). I'm not really sure where the domain XML with host-model is coming from. Is Nova on the target using the pre-migration hook to supply its own XML? Is this using a mixture of old and new libvirt or are all hosts on 7.4?

Anyway, could anyone who can reproduce this enable debug logs for libvirtd on all hosts and attach them from both the source and the destination host?

Comment 10 Lee Yarwood 2017-06-26 14:19:13 UTC
(In reply to Jiri Denemark from comment #9)
> No, Nova is not supposed to change the XML except for the parts which are
> visible only to a host (such as disk paths). I'm not really sure where the
> domain XML with host-model is coming from. Is Nova on the target using the
> pre-migration hook to supply its own XML? Is this using a mixture of old and
> new libvirt or are all hosts on 7.4?

The XML in c#4 with mode='host-model' set is from the source host, generated by Nova and used to launch the original domain. Yurli would need to confirm the state of both hosts before the migration started.
 
> Anyway, could anyone who can reproduce this enable debug logs for libvirtd
> on all hosts and attach them from both the source and the destination host?

ACK, if Yurli is struggling to find another env I can look into this.

Comment 11 Luyao Huang 2017-06-27 06:19:07 UTC
(In reply to Jiri Denemark from comment #9)
> No, Nova is not supposed to change the XML except for the parts which are
> visible only to a host (such as disk paths). I'm not really sure where the
> domain XML with host-model is coming from. Is Nova on the target using the
> pre-migration hook to supply its own XML? Is this using a mixture of old and
> new libvirt or are all hosts on 7.4?
> 
> Anyway, could anyone who can reproduce this enable debug logs for libvirtd
> on all hosts and attach them from both the source and the destination host?

I have reproduced this problem with libvirt:

1. start a guest on libvirt-2.0.0-10 (rhel7.3) with host-model:

  <cpu mode='host-model'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

# virsh start r7-mig
Domain r7-mig started


2. update libvirt to libvirt-3.2.0-14.el7 (rhel 7.4):

3. recheck running guest xml:

live:
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

migratable:
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

4. prepare migratable xml:

# virsh dumpxml r7-mig --migratable > /tmp/mig.xml

5. migrate to target host:

# virsh migrate r7-mig qemu+tcp://target/system --live --p2p --xml /tmp/mig.xml
error: unsupported configuration: Target CPU mode custom does not match source host-model

Comment 12 Luyao Huang 2017-06-27 06:27:18 UTC
Also another problem related to this problem:


1. start a guest on libvirt-2.0.0-10 (rhel7.3) with host-model:

  <cpu mode='host-model'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

# virsh start r7-mig
Domain r7-mig started


2. update libvirt to libvirt-3.2.0-14.el7 (rhel 7.4):

3. do managedsave and restart:

# virsh managedsave r7-mig

Domain r7-mig state saved by libvirt

# virsh start r7-mig
error: Failed to start domain r7-mig
error: operation failed: job: unexpectedly failed

4. check guest log:
...
-cpu Opteron_G5,vme=on,ht=on,monitor=on,osxsave=on,bmi1=on,mmxext=on,fxsr_opt=on,cmp_legacy=on,extapic=on,cr8legacy=on,osvw=on,ibs=on,skinit=on,wdt=on,lwp=on,tce=on,nodeid_msr=on,topoext=on,perfctr_core=on,perfctr_nb=on,invtsc=on...

warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.extapic [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.ibs [bit 10]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.skinit [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.wdt [bit 13]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.lwp [bit 15]

2017-06-27T06:22:39.015800Z qemu-kvm: State blocked by non-migratable device 'cpu'
2017-06-27T06:22:39.015879Z qemu-kvm: load of migration failed: Invalid argument

Comment 13 Jiri Denemark 2017-06-27 09:51:12 UTC
(In reply to Luyao Huang from comment #11)
> # virsh migrate r7-mig qemu+tcp://target/system --live --p2p --xml
> /tmp/mig.xml
> error: unsupported configuration: Target CPU mode custom does not match
> source host-model

Oh, thanks Luyao. I'm quite confused as this would work just fine without the --xml option.

Comment 16 Luyao Huang 2017-06-28 01:36:48 UTC
(In reply to Jiri Denemark from comment #13)
> (In reply to Luyao Huang from comment #11)
> > # virsh migrate r7-mig qemu+tcp://target/system --live --p2p --xml
> > /tmp/mig.xml
> > error: unsupported configuration: Target CPU mode custom does not match
> > source host-model
> 
> Oh, thanks Luyao. I'm quite confused as this would work just fine without
> the --xml option.

I have tested without --xml, migration still will get failure, just like comment 12:

1. start a guest on libvirt-2.0.0-10 (rhel7.3) with host-model:

  <cpu mode='host-model'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

# virsh start r7-mig
Domain r7-mig started

2. check qemu command line:
...
-cpu Opteron_G5,+vme,+ht,+monitor,+osxsave,+bmi1,+mmxext,+fxsr_opt,+cmp_legacy,+extapic,+cr8legacy,+osvw,+ibs,+skinit,+wdt,+lwp,+tce,+nodeid_msr,+topoext,+perfctr_core,+perfctr_nb
...

3. update libvirt to libvirt-3.2.0-14.el7 (rhel 7.4):

4. migrate to 7.4 host without --xml:

# virsh migrate r7-mig qemu+tcp://target/system --live --p2p
error: internal error: qemu unexpectedly closed the monitor: 2017-06-28T01:25:00.796084Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/2 (label charserial0)
2017-06-28T01:25:00.796532Z qemu-kvm: -chardev pty,id=charredir0: char device redirected to /dev/pts/3 (label charredir0)
2017-06-28T01:25:00.804956Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 6 7 8 9
2017-06-28T01:25:00.804983Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.extapic [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.ibs [bit 10]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.skinit [bit 12]
warning: host doesn't support requested feature: CPU


5. check the target host guest log:

...
warning: host doesn't support requested feature: CPUID.80000001H:ECX.extapic [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.ibs [bit 10]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.skinit [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.wdt [bit 13]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.lwp [bit 15]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.tce [bit 17]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.nodeid-msr [bit 19]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.topoext [bit 22]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.perfctr-core [bit 23]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.perfctr-nb [bit 24]
2017-06-28T01:25:00.982589Z qemu-kvm: State blocked by non-migratable device 'cpu'
2017-06-28T01:25:00.982703Z qemu-kvm: load of migration failed: Invalid argument

6. check the target qemu command line:

-cpu Opteron_G5,vme=on,ht=on,monitor=on,osxsave=on,bmi1=on,mmxext=on,fxsr_opt=on,cmp_legacy=on,extapic=on,cr8legacy=on,osvw=on,ibs=on,skinit=on,wdt=on,lwp=on,tce=on,nodeid_msr=on,topoext=on,perfctr_core=on,perfctr_nb=on,invtsc=on

you can see that libvirt add a invtsc=on in the qemu command line.



Source host caps:

virsh # capabilities 
...
    <cpu>
      <arch>x86_64</arch>
      <model>Opteron_G5</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='4' threads='1'/>
      <feature name='vme'/>
      <feature name='ht'/>
      <feature name='monitor'/>
      <feature name='osxsave'/>
      <feature name='bmi1'/>
      <feature name='mmxext'/>
      <feature name='fxsr_opt'/>
      <feature name='cmp_legacy'/>
      <feature name='extapic'/>
      <feature name='cr8legacy'/>
      <feature name='osvw'/>
      <feature name='ibs'/>
      <feature name='skinit'/>
      <feature name='wdt'/>
      <feature name='lwp'/>
      <feature name='tce'/>
      <feature name='nodeid_msr'/>
      <feature name='topoext'/>
      <feature name='perfctr_core'/>
      <feature name='perfctr_nb'/>
      <feature name='invtsc'/>
...

And target host caps:

virsh # capabilities 

    <cpu>
      <arch>x86_64</arch>
      <model>Opteron_G5</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='16' threads='1'/>
      <feature name='vme'/>
      <feature name='ht'/>
      <feature name='monitor'/>
      <feature name='osxsave'/>
      <feature name='bmi1'/>
      <feature name='mmxext'/>
      <feature name='fxsr_opt'/>
      <feature name='cmp_legacy'/>
      <feature name='extapic'/>
      <feature name='cr8legacy'/>
      <feature name='osvw'/>
      <feature name='ibs'/>
      <feature name='skinit'/>
      <feature name='wdt'/>
      <feature name='lwp'/>
      <feature name='tce'/>
      <feature name='nodeid_msr'/>
      <feature name='topoext'/>
      <feature name='perfctr_core'/>
      <feature name='perfctr_nb'/>
      <feature name='invtsc'/>

Comment 17 Jiri Denemark 2017-06-28 11:13:14 UTC
(In reply to Luyao Huang from comment #16)
> (In reply to Jiri Denemark from comment #13)
> > (In reply to Luyao Huang from comment #11)
> > Oh, thanks Luyao. I'm quite confused as this would work just fine without
> > the --xml option.
> 
> I have tested without --xml, migration still will get failure, just like
> comment 12:

Yes, I was talking about the issue in comment 11. Comment 12 is a separate issue. Could you please file a new bz for it? And don't forget to attach debug logs from both the source and the target libvirtd and full QEMU logs.

Comment 18 Jiri Denemark 2017-06-28 14:25:24 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg01263.html

Comment 19 Luyao Huang 2017-06-29 01:44:11 UTC
(In reply to Jiri Denemark from comment #18)
> Patch sent upstream for review:
> https://www.redhat.com/archives/libvir-list/2017-June/msg01263.html

Test this patch on libvirt-3.2.0-14.el7.x86_64, and test with save and migration , and cannot hit the error like:

error: unsupported configuration: Target CPU mode custom does not match source host-model

Comment 20 Luyao Huang 2017-06-29 03:23:06 UTC
(In reply to Jiri Denemark from comment #17)
> (In reply to Luyao Huang from comment #16)
> > (In reply to Jiri Denemark from comment #13)
> > > (In reply to Luyao Huang from comment #11)
> > > Oh, thanks Luyao. I'm quite confused as this would work just fine without
> > > the --xml option.
> > 
> > I have tested without --xml, migration still will get failure, just like
> > comment 12:
> 
> Yes, I was talking about the issue in comment 11. Comment 12 is a separate
> issue. Could you please file a new bz for it? And don't forget to attach
> debug logs from both the source and the target libvirtd and full QEMU logs.

Ok. I retested the issue mentioned in comment 16 and 12 with a build which apply your new patch, and found the issue still exist. So I filed a new bug 1466099. And thanks a lot for your reply.

Comment 24 Jiri Denemark 2017-07-12 07:59:18 UTC
So after some investigation and patch writing, it appeared this BZ is actually very similar to bug 1466099 even though I originally thought they were different. The old libvirt didn't change a host-model CPU into a custom one when starting a domain. Thus when new libvirt is installed and the daemon reconnects to the existing domain (started by the old libvirt), it will see a rather unexpected host-model CPU in a running domain which breaks several things.

Comment 25 Jiri Denemark 2017-07-12 08:00:01 UTC
*** Bug 1466099 has been marked as a duplicate of this bug. ***

Comment 26 Jiri Denemark 2017-07-12 12:58:03 UTC
Patches fixing the root cause of this issue were sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-July/msg00397.html

Comment 27 Jiri Denemark 2017-07-13 07:58:12 UTC
Fixed upstream by:

commit ee68bb391efb684341edb6286a1278631167f08c
Refs: v3.5.0-78-gee68bb391
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jun 27 15:06:10 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Don't update CPU when checking ABI stability

    When checking ABI stability between two domain definitions, we first
    make migratable copies of them. However, we also asked for the guest CPU
    to be updated, even though the updated CPU is supposed to be already
    included in the original definitions. Moreover, if we do this on the
    destination host during migration, we're potentially updating the
    definition with according to an incompatible host CPU.

    While updating the CPU when checking ABI stability doesn't make any
    sense, it actually just worked because updating the CPU doesn't do
    anything for custom CPUs (only host-model CPUs are affected) and we
    updated both definitions in the same way.

    Less then a year ago commit v2.3.0-rc1~42 stopped updating the CPU in
    the definition we got internally and only the user supplied definition
    was updated. However, the same commit started updating host-model CPUs
    to custom CPUs which are not affected by the request to update the CPU.
    So it still seemed to work right, unless a user upgraded libvirt 2.2.0
    to a newer version while there were some domains with host-model CPUs
    running on the host. Such domains couldn't be migrated with a user
    supplied XML since libvirt would complain:

        Target CPU mode custom does not match source host-model

    The fix is pretty straightforward, we just need to stop updating the CPU
    when checking ABI stability.

    https://bugzilla.redhat.com/show_bug.cgi?id=1463957

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit 83e081b8ab32dd990b4e4ccc7bf8a1a416fc51c2
Refs: v3.5.0-79-g83e081b8a
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 19 13:18:52 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    cpu_x86: Properly disable unknown CPU features

    CPU features unknown to a hypervisor will not be present in dataDisabled
    even though the features won't naturally be enabled because.
    Thus any features we asked for which are not in dataEnabled should be
    considered disabled.

    Signed-off-by: Jiri Denemark <jdenemar>

commit 40d246a22b46f1691d09cbce5904c79d712d8c16
Refs: v3.5.0-80-g40d246a22
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 13:18:45 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Add qemuProcessFetchGuestCPU

    Separated from qemuProcessUpdateLiveGuestCPU. Its purpose is to fetch
    guest CPU data from a running QEMU process. The data can later be used
    to verify and update the active guest CPU definition.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit 5cac2fe108f957b2629a29bea1747fdb3c8d7aa3
Refs: v3.5.0-81-g5cac2fe10
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 13:26:12 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Add qemuProcessVerifyCPU

    Separated from qemuProcessUpdateLiveGuestCPU. The function makes sure
    a guest CPU provides all features required by a domain definition.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit e6ed55e4e9c9ec21ca87573b69225d2fafc54272
Refs: v3.5.0-82-ge6ed55e4e
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 13:30:09 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Rename qemuProcessUpdateLiveGuestCPU

    In addition to updating a guest CPU definition the function verifies
    that all required features are provided to the guest. Let's make it
    obvious by calling it qemuProcessUpdateAndVerifyCPU.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit eef9f83b691e0713e4fc480b497b85517aba6ca4
Refs: v3.5.0-83-geef9f83b6
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 13:51:17 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Add qemuProcessUpdateLiveGuestCPU

    Separated from qemuProcessUpdateAndVerifyCPU to handle updating of an
    active guest CPU definition according to live data from QEMU.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit ee4180bef124cbc08a702689dda6fd95b21b1387
Refs: v3.5.0-84-gee4180bef
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 15:15:01 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Export virQEMUCapsGuestIsNative

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit aad362f93b4451e2f3c98923e5e44c4fe6d26d75
Refs: v3.5.0-85-gaad362f93
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 15:53:58 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Move qemuProcessReconnect to the end of qemu_process.c

    qemuProcessReconnect will need to call additional functions which were
    originally defined further in qemu_process.c.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

commit 7cf22b4879e819dee42e0a058f7ed149dc9d639a
Refs: [master], [fixes], {origin/master}, {origin/HEAD}, v3.5.0-86-g7cf22b487
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 11 14:16:40 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jul 13 09:53:15 2017 +0200

    qemu: Update host-model CPUs on reconnect

    When libvirt starts a new QEMU domain, it replaces host-model CPUs with
    the appropriate custom CPU definition. However, when reconnecting to a
    domain started by older libvirt (< 2.3), the domain would still have a
    host-model CPU in its active definition.

    https://bugzilla.redhat.com/show_bug.cgi?id=1463957

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

Comment 30 Luyao Huang 2018-01-24 06:18:22 UTC
Verify this bug with libvirt-3.9.0-9.el7.x86_64:

The same steps in bug 1470582 comment 5: pass

And the issue in comment 16 and comment 12:

1. check host support invtsc:

virsh # capabilities 
...
      <arch>x86_64</arch>
      <model>Skylake-Client-IBRS</model>
      <vendor>Intel</vendor>
      <microcode version='33554492'/>
      <topology sockets='1' cores='26' threads='2'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='avx512f'/>
      <feature name='clflushopt'/>
      <feature name='avx512cd'/>
      <feature name='stibp'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='pdpe1gb'/>
      <feature name='invtsc'/>

2. start a guest which use host-model:

# virsh dumpxml vm1
...
  <cpu mode='host-model'>
    <model fallback='forbid'/>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>
...

3. check qemu cmdline:

-cpu Skylake-Client-IBRS,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+dca,+osxsave,+tsc_adjust,+avx512f,+clflushopt,+avx512cd,+stibp,+pdpe1gb

4. update host to 7.5

5. recheck guest xml:

# virsh dumpxml vm1
...
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='allow'>Skylake-Client-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='avx512f'/>
    <feature policy='require' name='clflushopt'/>
    <feature policy='require' name='avx512cd'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='disable' name='dca'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='disable' name='pku'/>
    <feature policy='disable' name='ospke'/>
    <feature policy='disable' name='stibp'/>
    <feature policy='disable' name='avx512dq'/>
    <feature policy='disable' name='clwb'/>
    <feature policy='disable' name='avx512bw'/>
    <feature policy='disable' name='avx512vl'/>
  </cpu>
 
6. save guest and restore:

# virsh managedsave vm1

Domain vm1 state saved by libvirt

# virsh start vm1
Domain vm1 started

7. recheck guest xml:

# virsh dumpxml vm1
...
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Skylake-Client-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='avx512f'/>
    <feature policy='require' name='clflushopt'/>
    <feature policy='require' name='avx512cd'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='disable' name='dca'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='disable' name='pku'/>
    <feature policy='disable' name='ospke'/>
    <feature policy='disable' name='stibp'/>
    <feature policy='disable' name='avx512dq'/>
    <feature policy='disable' name='clwb'/>
    <feature policy='disable' name='avx512bw'/>
    <feature policy='disable' name='avx512vl'/>
...

8. recheck qemu cmdline:

-cpu Skylake-Client-IBRS,ss=on,hypervisor=on,tsc_adjust=on,avx512f=on,clflushopt=on,avx512cd=on,pdpe1gb=on,ds=off,acpi=off,ht=off,tm=off,pbe=off,dtes64=off,monitor=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,pku=off,ospke=off,stibp=off,avx512dq=off,clwb=off,avx512bw=off,avx512vl=off

Comment 34 errata-xmlrpc 2018-04-10 10:50:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704