Red Hat Bugzilla – Bug 1447874
Migration failed from rhel7.2.z->rhel7.4 with "-M rhel7.0.0" and "-device nec-usb-xhci"
Last modified: 2017-08-02 00:38:29 EDT
Description of problem: Migration failed from rhel7.2.z->rhel7.4 with "-M rhel7.0.0" and "-device nec-usb-xhci" Version-Release number of selected component (if applicable): rhel7.2.z host: kernel-3.10.0-327.53.1.el7.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.25.x86_64 rhel7.4 host: kernel-3.10.0-663.el7.x86_64 qemu-kvm-rhev-2.9.0-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.boot vm with "-M rhel7.0.0" and "-device nec-usb-xhci" in source rhel7.2.z host: # /usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.0.0 -cpu Opteron_G3,check -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd -monitor stdio 2. boot vm in destination rhel7.4 host: /usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.0.0 -cpu Opteron_G3,check -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd -monitor stdio -incoming tcp:0:5800 3. do migration Actual results: Migration is failed and qemu-kvm quits with qemu-kvm: get_pci_config_device: Bad config data: i=0x71 read: a0 device: 0 cmask: ff wmask: 0 w1cmask:0 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load xhci:parent_obj qemu-kvm: error while loading state for instance 0x0 of device '0000:00:0d.0/xhci' qemu-kvm: load of migration failed: Invalid argument Expected results: Migration can be finished normally. Additional info: Test qemu-kvm-rhev-2.8.0-5.el7, hit the same issue.
This seems to be a disagreement about MSI state on the xhci controller. on 7.3.z world: info qtree: dev: nec-usb-xhci, id "xhci" msi = true msix = true lspci -vvv: 00:0d.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI]) >>>>>>>>Capabilities: [70] MSI: Enable- Count=1/16 Maskable- 64bit+ >>>>>>>> Address: 0000000000000000 Data: 0000 On 7.4 world: info qtree: dev: nec-usb-xhci, id "xhci" msi = "auto" msix = "auto" lspci -vvv: Capabilities: [90] MSI-X: Enable+ Count=16 Masked- Vector table: BAR=0 offset=00003000 PBA: BAR=0 offset=00003800 Capabilities: [a0] Express (v2) Endpoint, MSI 00 so the MSI entry has gone AWOL
It's a little more subtle; the MSI entry hasn't gone awol; it's order has changed: 7.4 rpm, 7.0.0 type: 90,a0,70 90 is MSI-X, 70 is MSI, a0 is PCIe 7.4 rpm, 7.1.0 type: 90,70 7.4 rpm, 7.2.0 type: 90,70 7.4 rpm, 7.3.0 type: 90,70 7.4 rpm, 7.4.0 type: 90,70 7.3 rpm, 7.0.0 type: 90,70,a0 7.3 rpm, 7.1.0 type: 90,70 For 7.1.0 onwards it doesn't matter - because the PCIe block is disabled they match anyway. For the 7.0.0 type however the configuration space list is in a different order and it's this difference that it's objecting to. It's possible we shouldn't complain about list order, but it's hairy to mask.
Looks like the breakage upstream happened between 2.6.2 and 2.8.0 2.8.0 upstream with -M 1.5 90,a0,70 2.7.0 upstream with -M 1.5 90,a0,70 2.6.2 upstream with -M 1.5 90,70,a0 yes and migrate of 2.6.2->2.7.0 -M 1.5 fails - earliest fail is -M pc-i440fx-2.0 Looks like it was 1108b2f8a939fb5778d384149e2f1b99062a72da that broke it pci: Convert msi_init() to Error and fix callers to check it We can't fix upstream 2.6->2.9 without fixing 2.7->2.9 probably
Fix included in qemu-kvm-rhev-2.9.0-4.el7
Test 7.3.z->7.4 with usb-storage device under XHCI controller, the test steps and command line is as comment #12. 7.3.z host: kernel-3.10.0-514.18.1.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 7.4 host: kernel-3.10.0-664.el7.x86_64 qemu-kvm-rhev-2.9.0-5.el7.x86_64 Test the following matrix Host Machine Type Guest Result --------------------------------------------------- 7.3.z->7.4 rhel7.3.0 rhel7.4 pass 7.3.z->7.4 rhel7.0.0 rhel7.4 pass 7.3.z->7.4 rhel7.3.0 win2016 pass 7.3.z->7.4 rhel7.0.0 win2016 pass 7.3.z->7.4 rhel7.3.0 win8.1-32 pass 7.3.z->7.4 rhel7.0.0 win8.1-32 pass
Based on comment #16, set this bug to be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392