Bug 1435086 - Migration is failed from host RHEL7.3.z to host RHEL7.4 with "-machine pseries-rhel7.3.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1"
Summary: Migration is failed from host RHEL7.3.z to host RHEL7.4 with "-machine pserie...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: All
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: xianwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-23 05:26 UTC by xianwang
Modified: 2017-08-02 03:39 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 03:39:56 UTC


Attachments (Terms of Use)
libvirtd log on target host (891.77 KB, text/plain)
2017-05-15 09:55 UTC, Dan Zheng
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description xianwang 2017-03-23 05:26:52 UTC
Description of problem:
Do migration from RHEL7.3.Z to RHEL7.4, qemu cli including "-machine pseries-rhel7.3.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1", migration failed.

Version-Release number of selected component (if applicable):
(src)
RHEL-7.3-updates-20170222.0 Server ppc64le
kernel-3.10.0-514.17.1.el7
qemu-kvm-rhev-2.6.0-28.el7_3.7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
(dst)
RHEL-7.4-20170317.n.0 Server ppc64le
kernel-3.10.0-623.el7
qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

How reproducible:
3/3

Steps to Reproduce:
1.Boot a guest in RHEL7.3.z host(src host) with qemu cli
/usr/libexec/qemu-kvm -machine pseries-rhel7.3.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio

2.Boot a guest in RHEL7.4 host(dst host) with same qemu cli as src host appending "incoming" as following
/usr/libexec/qemu-kvm -machine pseries-rhel7.3.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio -incoming tcp:0:5801

3.check the pci info in src host and dst host
src host:
(qemu) info pci
  Bus  0, device   0, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0x81000000 [0x81ffffff].
      BAR2: 32 bit memory at 0xc0100000 [0xc0100fff].
      BAR6: 32 bit memory at 0xc0110000 [0xc011ffff].
      id ""
  Bus  0, device   1, function 0:
    USB controller: PCI device 1033:0194
      IRQ 0.
      BAR0: 64 bit memory at 0x200004000 [0x200007fff].
      id ""
  Bus  0, device   3, function 0:
    PCI bridge: PCI device 1b36:0001
      IRQ 0.
      BUS 0.
      secondary bus 1.
      subordinate bus 1.
      IO range [0x1000, 0x1fff]
      memory range [0xc0000000, 0xc00fffff]
      prefetchable memory range [0x180000000, 0x1800fffff]
      BAR0: 64 bit memory at 0x200000000 [0x2000000ff].
      id "pci_bridge"
dst host:
(qemu) info pci
  Bus  0, device   0, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0xffffffffffffffff [0x00fffffe].
      BAR2: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id ""
  Bus  0, device   1, function 0:
    USB controller: PCI device 1033:0194
      IRQ 0.
      BAR0: 64 bit memory at 0xffffffffffffffff [0x00003ffe].
      id ""
  Bus  0, device   3, function 0:
    PCI bridge: PCI device 1b36:0001
      BUS 0.
      secondary bus 1.
      subordinate bus 1.
      IO range [0x0000, 0x0fff]
      memory range [0x00000000, 0x000fffff]
      prefetchable memory range [0x00000000, 0x000fffff]
      id "pci_bridge"
4.Do migration from src to dst
src:
(qemu) migrate -d tcp:10.16.69.77:5801

Actual results:
after step 4, check the status of migration
src:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: completed
total time: 156 milliseconds
downtime: 39 milliseconds
setup: 4 milliseconds
transferred ram: 6000 kbytes
throughput: 501.42 mbps
remaining ram: 0 kbytes
total ram: 540736 kbytes
duplicate: 134093 pages
skipped: 0 pages
normal: 1203 pages
normal bytes: 4812 kbytes
dirty sync count: 3

dst:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x34 read: 4c device: 40 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pci_bridge:parent_obj
qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:03.0/pci_bridge'
qemu-kvm: load of migration failed: Invalid argument

Expected results:
migration successes and guest works well.

Additional info:

Comment 2 xianwang 2017-03-23 06:06:44 UTC
I have done some related test for this bug
a)Local migraion on RHEL7.4 with same test scenario as bug is well 
b)This bug is seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1352860, but these bugs are different, nec-usb-xhci is well for ppc to do migration from RHEL7.3.z to RHEL7.4, test steps are as following:

host version:
(src)
RHEL-7.3-updates-20170222.0 Server ppc64le
kernel-3.10.0-514.17.1.el7
qemu-kvm-rhev-2.6.0-28.el7_3.7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
(dst)
RHEL-7.4-20170317.n.0 Server ppc64le
kernel-3.10.0-623.el7
qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

steps:
(1)src host(RHEL7.3.z)
/usr/libexec/qemu-kvm -machine pseries-rhel7.3.0 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=03 -monitor stdio
(2)dst host(RHEL7.4)
/usr/libexec/qemu-kvm -machine pseries-rhel7.3.0 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=03 -monitor stdio -incoming tcp:0:5801
(3)do migration from src to dst
(qemu) migrate -d tcp:10.16.69.77:5801
(4)check the status of migration
src host
(qemu) info migrate
Migration status: completed
dst host
(qemu) info status 
VM status: running

So, I am not sure whether pci-bridge is well for x86 to do migration from RHEL7.3.z to RHEL7.4, huiqing, could you help confirm whether this bug exists for x86?

Comment 3 xianwang 2017-03-23 06:13:56 UTC
I)For RHEL7.3.z, in guest
# lspci -vvs3
00:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at 10320000000 (64-bit, non-prefetchable) [size=256]
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00001000-00001fff
	Memory behind bridge: c0100000-c01fffff
	Prefetchable memory behind bridge: 0000000280000000-00000002800fffff
	Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [4c] MSI: Enable+ Count=1/1 Maskable+ 64bit+
		Address: 0000040000000000  Data: 101d
		Masking: 00000000  Pending: 00000000
	Capabilities: [48] Slot ID: 0 slots, First+, chassis 01
	Capabilities: [40] Hot-plug capable
	Kernel driver in use: shpchp
	Kernel modules: shpchp

II)For RHEL7.4, in guest
# lspci -vvs3
00:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00001000-00001fff
	Memory behind bridge: c0100000-c01fffff
	Prefetchable memory behind bridge: 0000000281000000-00000002810fffff
	Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Slot ID: 0 slots, First+, chassis 01
	Kernel modules: shpchp

Comment 4 Laurent Vivier 2017-03-23 12:03:16 UTC
It seems the machine type is missing "SPAPR_COMPAT_2_8" part:

    {                                                           \
        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,                 \
        .property = "pcie-extended-configuration-space",        \
        .value    = "off",                                      \
    },

I'm checking if it can fix this.

Comment 5 Laurent Vivier 2017-03-23 13:28:21 UTC
(In reply to Laurent Vivier from comment #4)
> It seems the machine type is missing "SPAPR_COMPAT_2_8" part:
> 
>     {                                                           \
>         .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,                 \
>         .property = "pcie-extended-configuration-space",        \
>         .value    = "off",                                      \
>     },
> 
> I'm checking if it can fix this.


Adding this compat property doesn't fix the problem.

But a migration between qemu-kvm-rhev-2.6.0 and the previous 2.8.0 rebase works fine.

Comment 6 Laurent Vivier 2017-03-23 13:53:53 UTC
Migration of
pseries-2.6 from origin/v2.6.0 or origin/v2.8.0 to origin/v2.9.0-rc1 or
pseries-2.8 from origin/v2.8.0 to origin/v2.9.0-rc1,
work fine.

So the problem is purely downstream.

Comment 10 huiqingding 2017-03-24 02:50:15 UTC
> So, I am not sure whether pci-bridge is well for x86 to do migration from
> RHEL7.3.z to RHEL7.4, huiqing, could you help confirm whether this bug
> exists for x86?

I test x86 migration from rhel7.3.z to rhel7.4, also hit this issue.
rhel7.3.z source host:
qemu-kvm-rhev-2.6.0-28.el7_3.8.x86_64

rhel7.4 destination host:
qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.x86_64

The command line:
/usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.3.0 -net none -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio

After migration, rhel7.4 qemu-kvm quits with error info:
qemu-kvm: get_pci_config_device: Bad config data: i=0x34 read: 4c device: 40 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pci_bridge:parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0/pci_bridge'
qemu-kvm: load of migration failed: Invalid argument

Comment 12 huiqingding 2017-05-10 10:42:35 UTC
Test on x86 hosts:
7.3.z host:
kernel-3.10.0-514.18.1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
7.4 host:
kernel-3.10.0-664.el7.x86_64
qemu-kvm-rhev-2.9.0-3.el7.x86_64

Do migration from 7.3->7.4 using the following command line:
/usr/libexec/qemu-kvm -machine pc-i440fx-rhel7.3.0 -net none -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio

The migration can be finished normally.

Comment 13 Dan Zheng 2017-05-15 09:52:09 UTC
Happen to a similar problem with libvirt test.


Source host: [7.3]
qemu-kvm-rhev-2.6.0-28.el7_3.9.ppc64le
libvirt-2.0.0-10.virtcov.el7_3.9.ppc64le
kernel-3.10.0-514.21.1.el7.ppc64le
RHEL-7.3-20161019.0


Target host:[7.4]
libvirt-3.2.0-4.el7.ppc64le
qemu-kvm-rhev-2.9.0-4.el7.ppc64le
kernel-3.10.0-663.el7.ppc64le
RHEL-7.4-20170504.0


#  virsh  migrate avocado-vt-vm1 --live --verbose  qemu+ssh://10.16.67.243:22/system
Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2017-05-15T09:33:16.979459Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/2 (label charserial0)
2017-05-15T09:33:19.817889Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0
2017-05-15T09:33:19.817916Z qemu-kvm: Failed to load PCIDevice:config
2017-05-15T09:33:19.817923Z qemu-kvm: Failed to load virtio-net:virtio
2017-05-15T09:33:19.817932Z qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-net'
2017-05-15T09:33:19.818229Z qemu-kvm: load of migration failed: Invalid argument

Attachment is the libvirtd.log on target host.

Is it same root cause?

Comment 14 Dan Zheng 2017-05-15 09:55:51 UTC
Created attachment 1278909 [details]
libvirtd log on target host

Comment 15 Dr. David Alan Gilbert 2017-05-15 09:58:13 UTC
Hi Dan,
  I believe that's https://bugzilla.redhat.com/show_bug.cgi?id=1449346

Dave

Comment 16 xianwang 2017-05-25 10:55:15 UTC
This bug is verify pass on qemu-kvm-rhev-2.9.0-1.el7.
verify version:
Host:
(src)
RHEL-7.3-20161019.0 Server ppc64le
3.10.0-514.17.1.el7.ppc64le
qemu-kvm-rhev-2.6.0-28.el7_3.10.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
(dst)
RHEL-7.4-20170519.n.0 Server ppc64le
3.10.0-623.el7.ppc64le
qemu-kvm-rhev-2.9.0-1.el7.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch

steps are same as bug description.
I have test ping-pong migration 2 rounds, and the all results are pass.
migration completed.
in src:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: completed
in dst:
(qemu) info status 
VM status: running

What's more, this bug is also verify pass on qemu-kvm-rhev-2.9.0-6.el7.ppc64le.
But it must add "-nodefaults" to qemu cli, other steps are same with above, only with qemu-kvm-rhev version becoming qemu-kvm-rhev-2.9.0-6.el7.

So, this bug is verify pass for ppc, huiqing, please help to check the status of  this bug on x86_64, thanks

Comment 17 Qunfang Zhang 2017-05-31 10:25:13 UTC
this bug is verified pass on x86 side, see comment 12. So setting to VERIFIED.

Comment 19 errata-xmlrpc 2017-08-02 03:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.