Bug 896716
Summary: | [RFE] support migration with PCI passthrough network devices | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Laine Stump <laine> |
Component: | libvirt | Assignee: | Laine Stump <laine> |
Status: | CLOSED CANTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 | CC: | berrange, cwei, dayleparker, dyuan, jishao, jsuchane, juzhang, kyulee, lnovich, mzhan, rbalakri, sherold, trichard, xuzhang, ydu, zpeng |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-05-26 15:09:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1205796 |
Description
Laine Stump
2013-01-17 20:32:07 UTC
If the same model of NIC devices are installed in the same PCI slot of the source and destination hosts and the NIC supports SR-IOV, is it technically possible to migrate a vm with a PCI passthorugh network device between hosts? What other possible facts that can interference this feature working you can think? Thanks in advance, rogan The description of this BZ states the obstacles fairly clearly, although the very first sentence could use more emphasis. It is technically impossible to migrate a guest that has any PCI passthrough devices attached, since the hardware itself contains state that qemu cannot know and therefore cannot migrate. The method used by Solarflare was to automate the process of detaching the passed-through device before migration was started, and then attach a new device once migration was complete and the guest started up on the destination. This obviously requires cooperation from the guest, as it must be able to deal with having the device temporarily removed. The way that they solved this was to bond together the passed-through device together with an emulated network device connected to the same physical net; during migration performance would be degraded but at least everything would still work. Since, from the guest's point of view, it is not the same device on the source host and destination host, it turns out that the actual hardware on the two hosts does not have to be identical - as long as the mac address is set the same, and the guest OS supports hotplug/unplug and can deal with two different drivers using the same MAC address at different times that is. In the end it isn't simple (i.e. will take significant development time to get it right), and won't work with "just any" guest. That's why it still hasn't been done (at least not in a general way). There has been a lot of discussion on this topic in two threads initiated with patches from Chen Fan <chen.fan.fnst.com> https://www.redhat.com/archives/libvir-list/2015-April/msg00803.html (that thread has continued into the following month, which may require searching for the subject in the May archives) and https://www.redhat.com/archives/libvir-list/2015-May/msg00384.html The short form of all this from libvirt's point of view is that libvirt is not in the correct position to automatically do the detach and re-attach of the devices based on config options. I had previously been a proponent of that design, but the two mind-changing points for me were: 1) the case where a guest has been migrated to the destination host and CPUs re-started, but the auto-reattach fails. In this case, libvirt cannot report success up to the management layer, because the guest is not in the same pre-migration state (modulo the move to a new host), but it also can't report failure to migrate, because that would indicate that the guest is still running on the original host. There are of course several ways this could be handled, but it's impossible for libvirt to pick a single recovery scheme that would work for everyone. There are several other items that would need configuring as well, for example the maximum amount of time to wait for the device detach to complete. By the time libvirt provides configuration options for all these, and the management (e.g. OpenStack or ovirt) sets up the configuration, it would be just as simple for management to itself issue the libvirt commands to detach the device on the source and reattach on the destination after migration is completed. 2) The only way that libvirt can usefully manage auto-detach and re-attach of devices is by utilizing its network device pools (a network with <forward mode='hostdev'>) - this is needed so that libvirt has a method of finding an equivalent device on the destination host (since it's impractical/unrealistic to expect that the device at the exact same PCI address as that on the source host will also be available on the destination). However, OpenStack *never* uses libvirt's network device pool; it has its own method of determining which device to use, outside the scope of libvirt. This means that even if libvirt did implement some sort of auto-detach/reattach, it would be unusable by OpenStack. I'm closing the BZ. Feel free to re-open or open a new BZ requesting a modified form of participation by libvirt in this functionality. |