Bug 1912846 - qemu-kvm: Failed to load xhci:parent_obj during migration
Summary: qemu-kvm: Failed to load xhci:parent_obj during migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.4
Assignee: Dr. David Alan Gilbert
QA Contact: jingzhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-05 12:47 UTC by Li Xiaohui
Modified: 2021-05-25 06:47 UTC (History)
7 users (show)

Fixed In Version: qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:46:31 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
lspci-info (5.68 KB, application/gzip)
2021-01-05 13:05 UTC, Li Xiaohui
no flags Details

Description Li Xiaohui 2021-01-05 12:47:52 UTC
Description of problem:
Migrate guest with hugepages from 8.3 to 8.4, migration failed with error: qemu-kvm: Failed to load xhci:parent_obj.


Version-Release number of selected component (if applicable):
src host: qemu-img-5.1.0-16.module+el8.3.1+8958+410ab178.x86_64
dst host: qemu-img-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64


How reproducible:
100%


Steps to Reproduce:
1.Boot guest with cli on src host:
/usr/libexec/qemu-kvm  \
-name "mouse-vm" \
-sandbox off \
-machine q35 \
-device vmcoreinfo \
-cpu IvyBridge-IBRS \
-nodefaults  \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
-device nec-usb-xhci,id=usb1,bus=root0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/glusterfs/rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 2560 \
-mem-path /dev/hugepages \
-mem-prealloc \
-overcommit mem-lock=off \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server,nowait \
-qmp tcp:0:9999,server,nowait \
-qmp tcp:0:9888,server,nowait \
-serial tcp:0:4444,server,nowait \
-monitor stdio \
2.Boot guest with cli on dst host:
/usr/libexec/qemu-kvm  \
-name "mouse-vm" \
-sandbox off \
-machine q35,memory-backend=pc.ram \
-device vmcoreinfo \
-cpu IvyBridge-IBRS \
-nodefaults  \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
-device nec-usb-xhci,id=usb1,bus=root0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/glusterfs/rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 2560 \
-object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=2684354560 \
-overcommit mem-lock=off \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server,nowait \
-qmp tcp:0:9999,server,nowait \
-qmp tcp:0:9888,server,nowait \
-serial tcp:0:4444,server,nowait \
-monitor stdio \
-incoming defer \
3.Migrate guest from src to dst host


Actual results:
Migration failed on dst host with error:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x71 read: a0 device: 0 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load xhci:parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:02.0:00.0/xhci'
qemu-kvm: load of migration failed: Invalid argument


Expected results:
Migration succeed


Additional info:
Hit this bz When reproduce bz: Bug 1912201 - qemu-kvm: Unknown ramblock "pc.ram" during migration

Comment 1 Li Xiaohui 2021-01-05 13:04:33 UTC
Provide information "lspci -vvv" when boot guest on src and dst host using same qemu commands and environment as Comment 0

Comment 2 Li Xiaohui 2021-01-05 13:05:19 UTC
Created attachment 1744581 [details]
lspci-info

Comment 3 Dr. David Alan Gilbert 2021-01-06 11:44:11 UTC
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x71 read: a0 device: 0 cmask: ff wmask: 0 w1cmask:0

looking at the diff in the lspci we have:

source:


01:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
	Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00003800
	Capabilities: [70] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us

....
Dest:

	Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00003800
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W

so the MSI is the problem.

I thought that had been fixed already upstream in 172bc8520db1cb98d09b367360068a675fbc9413

Comment 4 Dr. David Alan Gilbert 2021-01-06 11:52:41 UTC
oh actually, this is more subtle; the problem is that the destination *does* still have the MSI, but it's got an orderingproblem:

01:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
	Subsystem: Red Hat, Inc. QEMU Virtual Machine
	Physical Slot: 0
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 22
	Region 0: Memory at fe800000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00003800
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 4
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [70] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel driver in use: xhci_hcd

Comment 5 Dr. David Alan Gilbert 2021-01-06 19:51:20 UTC
Ah! This is a resurrection of bz 1447874 that I fixed back in May 2017; I love it when old bugs
come back to bite.

Comment 13 errata-xmlrpc 2021-05-25 06:46:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.