Bug 1389284

Summary: Failed cold migration with SR-IOV
Product: Red Hat OpenStack Reporter: Benjamin Schmaus <bschmaus>
Component: openstack-novaAssignee: Artom Lifshitz <alifshit>
Status: CLOSED UPSTREAM QA Contact: Prasanth Anbalagan <panbalag>
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.0 (Mitaka)CC: alifshit, assaf.eylath, awaugama, berrange, brault, dasmith, eglynn, fbaudin, gkeegan, kchamart, myllynen, sbandyop, sbauza, sclewis, sferdjao, sgordon, srevivo, tvvcox, vromanso
Target Milestone: gaKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-14.0.0-0.20160907211856.14d816e.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-14 19:06:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Benjamin Schmaus 2016-10-27 11:22:54 UTC
Description of problem:

Cold migration of an instance that has an SR-IOV interface fails to migrate because on migrated compute's nova is trying to use the PCI device/address that has been allocated from the incoming compute. Obviously this is failing since the PCI device is not present on the migrated compute.

See the error "libvirtError: Device 0000:83:10.6 not found: could not access /sys/bus/pci/devices/0000:83:10.6/config: No such file or directory" in the log in the attachment.

Nova should allocate a new PCI device based the hardware configuration of the compute where the instance is being migrated and this PCI device should be use to create the instance XML.


Version-Release number of selected component (if applicable):
OSP9 - Fix requested for OSP9

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Benjamin Schmaus 2016-11-09 20:24:21 UTC
Customer gets  "message": "'MigrationContext' object has no attribute 'old_pci_devices'" when trying to migrate an instance w/sriov port  with hotfix applied.

Awaiting logs and will put into collab for engineering review

Comment 9 Artom Lifshitz 2016-11-10 01:09:03 UTC
I'm pretty sure I've identified the cause of the message they're seeing. I've filed bz 1393561 against rhos9 to track it. To repeat what I've said in that bz, we're missing [1] in rhos9 - that's the patch that introduced version 1.1 of MigrationContext, which added the old_pci_devices and new_pci_devices fields to the object. It's a big patch - 240 lines spread over a dozen files, so its backportability remains to be determined. In the meantime, we'll need to revert the fix that we did for this bz 1389284 out of rhos9 because without [1], it breaks migrations.

[1] https://review.openstack.org/#/c/307124/