Bug 1352860

Summary: Migration is failed from host RHEL7.2.z to host RHEL7.3 with "-M pc-i440fx-rhel7.0.0 -device nec-usb-xhci"
Product: Red Hat Enterprise Linux 7 Reporter: huiqingding <huding>
Component: qemu-kvm-rhevAssignee: Michael S. Tsirkin <mst>
Status: CLOSED ERRATA QA Contact: huiqingding <huding>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, dgilbert, huding, juzhang, knoel, kraxel, mrezanin, mst, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-19.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 21:21:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1289197    

Description huiqingding 2016-07-05 09:49:04 UTC
Description of problem:
Migration is failed from host RHEL7.2.z to host RHEL7.3 with "-M pc-i440fx-rhel7.0.0 -device nec-usb-xhci" 

Version-Release number of selected component (if applicable):

Host RHEL7.2.z:
kernel-3.10.0-327.28.2.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.16.x86_64

Host RHEL7.3:
kernel-3.10.0-445.el7.x86_64
qemu-kvm-rhev-2.6.0-11.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot vm on src host RHEL7.2.z:
# /usr/libexec/qemu-kvm  -cpu SandyBridge -M pc-i440fx-rhel7.0.0 -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0x4 -monitor stdio
2.  boot vm on dst host RHEL7.3:
# /usr/libexec/qemu-kvm  -cpu SandyBridge -M pc-i440fx-rhel7.0.0 -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0x4 -monitor stdio -incoming tcp:0:5800
3. do migration from src host to dst host
(qemu) migrate -d tcp:{dst_ip}:5800

Actual results:
qemu-kvm of dst host quits with error:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0xb3 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:04.0/xhci'
qemu-kvm: load of migration failed: Invalid argument


Expected results:
Migration can be finished normally.

Additional info:

Comment 2 Gerd Hoffmann 2016-07-05 15:11:33 UTC
Can you attach "lspci -vvs4" output for both RHEL-7.2.z and RHEL-7.3 qemu-kvm please?

Comment 3 huiqingding 2016-07-06 07:26:42 UTC
(In reply to Gerd Hoffmann from comment #2)
> Can you attach "lspci -vvs4" output for both RHEL-7.2.z and RHEL-7.3
> qemu-kvm please?

For RHEL-7.2.z, inside guest:
# lspci -vvs4 -s 00:0d.0
00:0d.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
	Subsystem: Red Hat, Inc QEMU Virtual Machine
	Physical Slot: 13
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at fc158000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00003800
	Capabilities: [70] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Kernel driver in use: xhci_hcd

For RHEL-7.3, inside guest:
# lspci -vvs4 -s 00:0d.0
00:0d.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
	Subsystem: Red Hat, Inc QEMU Virtual Machine
	Physical Slot: 13
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at fc158000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00003800
	Capabilities: [70] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Kernel driver in use: xhci_hcd

Comment 4 Gerd Hoffmann 2016-07-06 08:18:15 UTC
> For RHEL-7.2.z, inside guest:
> # lspci -vvs4 -s 00:0d.0

> 		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt-
> ABWMgmt-

> For RHEL-7.3, inside guest:
> # lspci -vvs4 -s 00:0d.0

> 		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt-
> ABWMgmt-

DLActive flipped, commit b2101eae63ea57b571cee4a9075a4287d24ba4a4:

    pcie: Set the "link active" in the link status register
    
    Some firmwares can test that and assume the device hasn't come
    up if that bit isn't set
    
    Signed-off-by: Benjamin Herrenschmidt <benh.org>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Comment 5 Gerd Hoffmann 2016-07-06 08:19:52 UTC
mst: see comment 4.

Comment 6 Michael S. Tsirkin 2016-07-19 22:35:00 UTC
Upstream patch:
http://article.gmane.org/gmane.comp.emulators.qemu/428294

Comment 7 Miroslav Rezanina 2016-08-05 10:56:53 UTC
Fix included in qemu-kvm-rhev-2.6.0-19.el7

Comment 9 huiqingding 2016-08-25 08:53:17 UTC
Reproduce this bug using the version:
RHEL7.2.z host:
kernel-3.10.0-327.37.1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
RHEL7.3 host:
kernel-3.10.0-493.el7.x86_64
qemu-kvm-rhev-2.6.0-18.el7.x86_64

Steps to Reproduce:
1. boot vm on src host RHEL7.2.z:
# /usr/libexec/qemu-kvm  -cpu SandyBridge -M pc-i440fx-rhel7.0.0 -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0x4 -monitor stdio
2.  boot vm on dst host RHEL7.3:
# /usr/libexec/qemu-kvm  -cpu SandyBridge -M pc-i440fx-rhel7.0.0 -device nec-usb-xhci,id=xhci,bus=pci.0,addr=0x4 -monitor stdio -incoming tcp:0:5800
3. do migration from src host to dst host
(qemu) migrate -d tcp:{dst_ip}:5800

Actual results:
qemu-kvm of dst host quits with error:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0xb3 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:04.0/xhci'
qemu-kvm: load of migration failed: Invalid argument

Verify this bug using the version:
RHEL7.2.z host:
kernel-3.10.0-327.37.1.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
RHEL7.3 host:
kernel-3.10.0-493.el7.x86_64
qemu-kvm-rhev-2.6.0-21.el7.x86_64

Test the above steps, migration can be finished normally.

Comment 10 huiqingding 2016-09-08 10:25:41 UTC
Based on comment #9, set this bug to be verified.

Comment 12 errata-xmlrpc 2016-11-07 21:21:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html