Bug 1447874

Summary: Migration failed from rhel7.2.z->rhel7.4 with "-M rhel7.0.0" and "-device nec-usb-xhci"
Product: Red Hat Enterprise Linux 7 Reporter: huiqingding <huding>
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: huiqingding <huding>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: chayang, dgilbert, dzheng, hhuang, huding, juzhang, knoel, michen, mrezanin, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-4.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huiqingding 2017-05-04 06:16:58 UTC
Description of problem:
Migration failed from rhel7.2.z->rhel7.4 with "-M rhel7.0.0" and "-device nec-usb-xhci"

Version-Release number of selected component (if applicable):
rhel7.2.z host:
kernel-3.10.0-327.53.1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.25.x86_64

rhel7.4 host:
kernel-3.10.0-663.el7.x86_64
qemu-kvm-rhev-2.9.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot vm with "-M rhel7.0.0" and "-device nec-usb-xhci" in source rhel7.2.z host:
# /usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.0.0 -cpu Opteron_G3,check -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd -monitor stdio
2. boot vm in destination rhel7.4 host:
/usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.0.0 -cpu Opteron_G3,check -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0xd -monitor stdio -incoming tcp:0:5800
3. do migration

Actual results:
Migration is failed and qemu-kvm quits with
qemu-kvm: get_pci_config_device: Bad config data: i=0x71 read: a0 device: 0 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load xhci:parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:0d.0/xhci'
qemu-kvm: load of migration failed: Invalid argument


Expected results:
Migration can be finished normally.

Additional info:
Test qemu-kvm-rhev-2.8.0-5.el7, hit the same issue.

Comment 8 Dr. David Alan Gilbert 2017-05-05 11:49:35 UTC
This seems to be a disagreement about MSI state on the xhci controller.

on 7.3.z world:
info qtree:
      dev: nec-usb-xhci, id "xhci"
        msi = true
        msix = true

lspci -vvv:

00:0d.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])

>>>>>>>>Capabilities: [70] MSI: Enable- Count=1/16 Maskable- 64bit+
>>>>>>>>        Address: 0000000000000000  Data: 0000

On 7.4 world:
info qtree:
      dev: nec-usb-xhci, id "xhci"
        msi = "auto"
        msix = "auto"

lspci -vvv:
        Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00003800
        Capabilities: [a0] Express (v2) Endpoint, MSI 00

so the MSI entry has gone AWOL

Comment 9 Dr. David Alan Gilbert 2017-05-05 15:56:30 UTC
It's a little more subtle; the MSI entry hasn't gone awol; it's order has changed:

7.4 rpm, 7.0.0 type: 90,a0,70   90 is MSI-X, 70 is MSI, a0 is PCIe
7.4 rpm, 7.1.0 type: 90,70
7.4 rpm, 7.2.0 type: 90,70
7.4 rpm, 7.3.0 type: 90,70
7.4 rpm, 7.4.0 type: 90,70

7.3 rpm, 7.0.0 type: 90,70,a0
7.3 rpm, 7.1.0 type: 90,70

For 7.1.0 onwards it doesn't matter - because the PCIe block is disabled they match anyway.
For the 7.0.0 type however the configuration space list is in a different order and it's this difference that it's objecting to.

It's possible we shouldn't complain about list order, but it's hairy to mask.

Comment 10 Dr. David Alan Gilbert 2017-05-05 17:39:19 UTC
Looks like the breakage upstream happened between 2.6.2 and 2.8.0

2.8.0 upstream with -M 1.5   90,a0,70
2.7.0 upstream with -M 1.5   90,a0,70
2.6.2 upstream with -M 1.5   90,70,a0
  yes and migrate of 2.6.2->2.7.0 -M 1.5 fails - earliest fail is -M pc-i440fx-2.0

Looks like it was 1108b2f8a939fb5778d384149e2f1b99062a72da that broke it
   pci: Convert msi_init() to Error and fix callers to check it

We can't fix upstream 2.6->2.9 without fixing 2.7->2.9 probably

Comment 14 Miroslav Rezanina 2017-05-12 11:51:21 UTC
Fix included in qemu-kvm-rhev-2.9.0-4.el7

Comment 16 huiqingding 2017-05-23 10:35:56 UTC
Test 7.3.z->7.4 with usb-storage device under XHCI controller, the test steps and command line is as comment #12. 
7.3.z host:
kernel-3.10.0-514.18.1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
7.4 host:
kernel-3.10.0-664.el7.x86_64
qemu-kvm-rhev-2.9.0-5.el7.x86_64

Test the following matrix
Host        Machine Type       Guest        Result
---------------------------------------------------
7.3.z->7.4  rhel7.3.0         rhel7.4       pass
7.3.z->7.4  rhel7.0.0         rhel7.4       pass
7.3.z->7.4  rhel7.3.0         win2016       pass
7.3.z->7.4  rhel7.0.0         win2016       pass
7.3.z->7.4  rhel7.3.0         win8.1-32     pass
7.3.z->7.4  rhel7.0.0         win8.1-32     pass

Comment 17 huiqingding 2017-05-23 10:36:44 UTC
Based on comment #16, set this bug to be verified.

Comment 19 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392