Description of problem:
Migration failed from rhel7.2.z->rhel7.4 with "-M rhel7.0.0" and "-device nec-usb-xhci"
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.boot vm with "-M rhel7.0.0" and "-device nec-usb-xhci" in source rhel7.2.z host:
# /usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.0.0 -cpu Opteron_G3,check -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd -monitor stdio
2. boot vm in destination rhel7.4 host:
/usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.0.0 -cpu Opteron_G3,check -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd -monitor stdio -incoming tcp:0:5800
3. do migration
Migration is failed and qemu-kvm quits with
qemu-kvm: get_pci_config_device: Bad config data: i=0x71 read: a0 device: 0 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load xhci:parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:0d.0/xhci'
qemu-kvm: load of migration failed: Invalid argument
Migration can be finished normally.
Test qemu-kvm-rhev-2.8.0-5.el7, hit the same issue.
This seems to be a disagreement about MSI state on the xhci controller.
on 7.3.z world:
dev: nec-usb-xhci, id "xhci"
msi = true
msix = true
00:0d.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
>>>>>>>>Capabilities:  MSI: Enable- Count=1/16 Maskable- 64bit+
>>>>>>>> Address: 0000000000000000 Data: 0000
On 7.4 world:
dev: nec-usb-xhci, id "xhci"
msi = "auto"
msix = "auto"
Capabilities:  MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=0 offset=00003000
PBA: BAR=0 offset=00003800
Capabilities: [a0] Express (v2) Endpoint, MSI 00
so the MSI entry has gone AWOL
It's a little more subtle; the MSI entry hasn't gone awol; it's order has changed:
7.4 rpm, 7.0.0 type: 90,a0,70 90 is MSI-X, 70 is MSI, a0 is PCIe
7.4 rpm, 7.1.0 type: 90,70
7.4 rpm, 7.2.0 type: 90,70
7.4 rpm, 7.3.0 type: 90,70
7.4 rpm, 7.4.0 type: 90,70
7.3 rpm, 7.0.0 type: 90,70,a0
7.3 rpm, 7.1.0 type: 90,70
For 7.1.0 onwards it doesn't matter - because the PCIe block is disabled they match anyway.
For the 7.0.0 type however the configuration space list is in a different order and it's this difference that it's objecting to.
It's possible we shouldn't complain about list order, but it's hairy to mask.
Looks like the breakage upstream happened between 2.6.2 and 2.8.0
2.8.0 upstream with -M 1.5 90,a0,70
2.7.0 upstream with -M 1.5 90,a0,70
2.6.2 upstream with -M 1.5 90,70,a0
yes and migrate of 2.6.2->2.7.0 -M 1.5 fails - earliest fail is -M pc-i440fx-2.0
Looks like it was 1108b2f8a939fb5778d384149e2f1b99062a72da that broke it
pci: Convert msi_init() to Error and fix callers to check it
We can't fix upstream 2.6->2.9 without fixing 2.7->2.9 probably
Fix included in qemu-kvm-rhev-2.9.0-4.el7
Test 7.3.z->7.4 with usb-storage device under XHCI controller, the test steps and command line is as comment #12.
Test the following matrix
Host Machine Type Guest Result
7.3.z->7.4 rhel7.3.0 rhel7.4 pass
7.3.z->7.4 rhel7.0.0 rhel7.4 pass
7.3.z->7.4 rhel7.3.0 win2016 pass
7.3.z->7.4 rhel7.0.0 win2016 pass
7.3.z->7.4 rhel7.3.0 win8.1-32 pass
7.3.z->7.4 rhel7.0.0 win8.1-32 pass
Based on comment #16, set this bug to be verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.