Description of problem: Openstack envionment , config sriov then configure computer HA and then create a sriov vm which is contained a direct type vnic , shutdown the host node of the vm , vm rebuild on another computer node , but it got failed , because the vf with same pci number has been occupied on destination compute node. Version-Release number of selected component (if applicable): RHEL OSP 7 How reproducible: Everytime for Customer. Steps to Reproduce: 1. Spawn an instance using VF with pci_slot id similar to the one which is used by the instance running on destination compute node. 2. Shutdown the compute node, as instance HA is configured hence it will try to rebuild the instance on destination compute node using same pci_slot id which is already used by the instance running on that node. 3. Instance will end up in error state with below message. ~~~ | fault | {"message": "Requested operation is not valid: PCI device 0000:04:07.3 is in use by driver QEMU , domain instance-0000019d", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 357, in decorated_function | ~~~ Actual results: It's not getting spawned on destination compute node if VF with same pci_slot id is used on destination compute node. Expected results: It should get spawned successfully. nova fitting logic should use another VF if the vf which instance tries to choose is already used. Additional info: It look like a same issue which we have seen with cpu pinned instance mentioned in BZ https://bugzilla.redhat.com/show_bug.cgi?id=1319385
*** This bug has been marked as a duplicate of bug 1222414 ***
This issue is related to live-migrate instances with SRIOV and should not be considered as the same as live-migrating with CPU pinning. Both can work independently and even is the fix handle both situations (which i have some doubts) QA will have to test them independently.
(In reply to Sahid Ferdjaoui from comment #13) > This issue is related to live-migrate instances with SRIOV and should not be > considered as the same as live-migrating with CPU pinning. Both can work > independently and even is the fix handle both situations (which i have some > doubts) QA will have to test them independently. OK, thanks for the context. I've updated the title accordingly.
Requirements: - Support live migration with passthrough of full PCI NIC. - Support live migration with passthrough of PF. - Support live migration with passthrough of VF. - In all cases, performance of networking in general VM lifecycle should not be impacted. Performance degradation during live migration is acceptable.
(In reply to Stephen Gordon from comment #15) > Requirements: > > - Support live migration with passthrough of full PCI NIC. > - Support live migration with passthrough of PF. > - Support live migration with passthrough of VF. > - In all cases, performance of networking in general VM lifecycle should not > be impacted. Performance degradation during live migration is acceptable. Achieving this from a technical POV, would require a multi-nic setup in the guest with bonding/teaming. ie every guest would need to have 2 NICs, one SRIOV based and one emulated, both connected to same host network. At migration the SRIOV device would have to be hot-removed, and a new one added afterwards. IOW, as well as impacting guest network performance, you need to mandate special guest setup, and guest cooperation for hot-unplug at start of migration. The implication is if the guest OS is crashed, is in early boot up phase, or otherwise non-responsive, live migration still won't be possible as it won't be responding to the initial hot-unplug request. Not a showstopper though - largely a documentation / expectation setting problem.
*** Bug 1631723 has been marked as a duplicate of this bug. ***
Note this feature is being targeted for OSP 16 and will not be backportable.
all functional code releated to this RFE has merged upstream in master. there is 1 minor follow up patch still pending to adress some code style nits https://review.opendev.org/#/c/659101/ and a docs only patch that needs to be written but this RFE is now feature commplete upstream and testing can start.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283