Bug 1389284 - Failed cold migration with SR-IOV
Summary: Failed cold migration with SR-IOV
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 9.0 (Mitaka)
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ga
: 10.0 (Newton)
Assignee: Artom Lifshitz
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-27 11:22 UTC by Benjamin Schmaus
Modified: 2020-01-17 16:05 UTC (History)
19 users (show)

Fixed In Version: openstack-nova-14.0.0-0.20160907211856.14d816e.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-14 19:06:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1512880 0 None None None 2016-10-27 11:22:54 UTC

Description Benjamin Schmaus 2016-10-27 11:22:54 UTC
Description of problem:

Cold migration of an instance that has an SR-IOV interface fails to migrate because on migrated compute's nova is trying to use the PCI device/address that has been allocated from the incoming compute. Obviously this is failing since the PCI device is not present on the migrated compute.

See the error "libvirtError: Device 0000:83:10.6 not found: could not access /sys/bus/pci/devices/0000:83:10.6/config: No such file or directory" in the log in the attachment.

Nova should allocate a new PCI device based the hardware configuration of the compute where the instance is being migrated and this PCI device should be use to create the instance XML.


Version-Release number of selected component (if applicable):
OSP9 - Fix requested for OSP9

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Benjamin Schmaus 2016-11-09 20:24:21 UTC
Customer gets  "message": "'MigrationContext' object has no attribute 'old_pci_devices'" when trying to migrate an instance w/sriov port  with hotfix applied.

Awaiting logs and will put into collab for engineering review

Comment 9 Artom Lifshitz 2016-11-10 01:09:03 UTC
I'm pretty sure I've identified the cause of the message they're seeing. I've filed bz 1393561 against rhos9 to track it. To repeat what I've said in that bz, we're missing [1] in rhos9 - that's the patch that introduced version 1.1 of MigrationContext, which added the old_pci_devices and new_pci_devices fields to the object. It's a big patch - 240 lines spread over a dozen files, so its backportability remains to be determined. In the meantime, we'll need to revert the fix that we did for this bz 1389284 out of rhos9 because without [1], it breaks migrations.

[1] https://review.openstack.org/#/c/307124/


Note You need to log in before you can comment on or make changes to this bug.