Description of problem: Client is in a middle of a FFU (13.0.16 -> 16.1.6) and is trying to do live migration but sometimes it fails with: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers timer() File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 175, in _do_send waiter.switch(result) File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 221, in main result = function(*args, **kwargs) File "/usr/lib/python3.6/site-packages/nova/utils.py", line 675, in context_wrapper return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 8844, in _live_migration_operation LOG.error("Live Migration failure: %s", e, instance=instance) File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise raise value File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 8837, in _live_migration_operation bandwidth=CONF.libvirt.live_migration_bandwidth) File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 689, in migrate destination, params=params, flags=flags) File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit result = proxy_call(self._autowrap, f, *args, **kwargs) File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call rv = execute(f, *args, **kwargs) File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute six.reraise(c, e, tb) File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise raise value File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker rv = meth(*args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1943, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirt.libvirtError: unsupported configuration: Target network card model e1000e does not match source virtio After talking with engineering it could be caused by properties in the image. Here's part of the properties: os_distro='centos6.9', os_version='6.9' What can be done for this situation ? Version-Release number of selected component (if applicable): OSP16.1.6 How reproducible: Random. Sometimes it works with instances with the same image. Steps to Reproduce: 1. Try a live migration. 2. 3. Actual results: Live migration fails. Expected results: Live migration succeed. Additional info: We have logs and sosreport.
This is caused by libosinfo. VMs that were created with the OSP 13 version of libos info may not have the same model selected by libosinfo in osp16. from the image we can see that they are in fact using libosinfo os_distro='centos6.9' we generally advise against using this feature and plan to remove it in the future. libosinfo assume that it will only be used on frist boot and that you will persist the xml for the lifetime fo the vm. as a resutl they make breaking changes in reslease and alter the defice models selected over time. why this happens is pretty simple. nova currently reuses the same function for genergate the network interface xml during frist boot and live migration. during live migration we regenerate the netwrok interface eleemnt to aloow live migration between differn network backend. live migration between differnt network frontends however is not supported. the error above is because the frontend was change when we regenerated the xml whic normally will not happen as we do not expect the libosinfo verion to change. in this case libosinfo in 13/rhel 7 returns virtio for the network vif model but in 16/rhel8 it returns e1000e. e1000e is incorrect in several cases first its the pcie version fo the devcie but the pc-ix440 machine type we use by default only has pci support not pcie. second while e1000e does often work better for older operatig system or window where virtio driver may not be present it is generally slower then virtio and virtio should be prefered. The fix for this is somewhat complex to implement but simple to expleain. we will need read the current device model form the interface xml element and pass that to the function that generates the new element so that the frontend will not change. this is complex to implemnt as we do that via a semi indrict callback methond. this will take time and we are unlikely to backprot this to 16.1, we will likely backport it only to 16.2.2+ which will not be released until q1 next year. in the interim the workaroud is to cold migrate. technially cold migration is tech preview in the hybrid state but it can be used to work around this type of issue. another unsupportred alternitive woudl be to hard reboot the vm then live migrate it. this is untested currently and not supproted in the hybrid state in 13->16 FFU it is planned to be supported in 16.2 -> to 17.1 i should also note that now that 16.2 has been release the officially support upgrade path is from 13->16.2 not to 16.1 going forward i would recommend that the custome discontinue use of libosinfo by removing the verion number form os_disto we will likely depreate this entirely and remove support for it in osp 18.
Testing: functional tests, because it's hard to get different versions of libosinfo on source and dest.