Bug 1508271 - Migration is failed from host RHEL7.4.z to host RHEL7.5 with "-machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1"
Summary: Migration is failed from host RHEL7.4.z to host RHEL7.5 with "-machine pserie...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: xianwang
URL:
Whiteboard:
Keywords: Regression
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-01 02:43 UTC by xianwang
Modified: 2018-04-11 00:45 UTC (History)
13 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-04-11 00:44:15 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1104 None None None 2018-04-11 00:45 UTC

Description xianwang 2017-11-01 02:43:07 UTC
Description of problem:
Do migration from RHEL7.4.Z to RHEL7.5, qemu cli including "-machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1", migration failed.

Version-Release number of selected component (if applicable):
(src)
RHEL-7.4 Server ppc64le
kernel-3.10.0-693.11.1.el7.ppc64le
qemu-kvm-rhev-2.9.0-16.el7_4.10.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch
(dst)
RHEL-7.4 Server ppc64le
kernel-3.10.0-760.el7.ppc64le
qemu-kvm-rhev-2.10.0-3.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

How reproducible:
3/3

Steps to Reproduce:
1.Boot a guest in RHEL7.4.z host(src host) with qemu cli
/usr/libexec/qemu-kvm -machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio

2.Boot a guest in RHEL7.5 host(dst host) with same qemu cli as src host appending "incoming" as following
/usr/libexec/qemu-kvm -machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio -incoming tcp:0:5801

3.Do migration from src to dst
src:
(qemu) migrate -d tcp:10.16.42.48:5801

Actual results:
after step 4, check the status of migration
src:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: completed
total time: 165 milliseconds
downtime: 44 milliseconds
setup: 10 milliseconds
transferred ram: 5964 kbytes
throughput: 472.23 mbps
remaining ram: 0 kbytes
total ram: 540736 kbytes
duplicate: 134102 pages
skipped: 0 pages
normal: 1194 pages
normal bytes: 4776 kbytes
dirty sync count: 3

dst:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x34 read: 4c device: 40 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pci_bridge:parent_obj
qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:03.0/pci_bridge'
qemu-kvm: load of migration failed: Invalid argument


Expected results:
migration successes and guest works well.

Additional info:
Before migration, check the qtree in src host:
(qemu) info qtree 
bus: main-system-bus
  type System
  dev: spapr-pci-host-bridge, id ""
    index = 0 (0x0)
    buid = 576460752840294400 (0x800000020000000)
    liobn = 2147483648 (0x80000000)
    liobn64 = 2147483649 (0x80000001)
    mem_win_addr = 35186519572480 (0x200080000000)
    mem_win_size = 2147483648 (0x80000000)
    mem64_win_addr = 36283883716608 (0x210000000000)
    mem64_win_size = 1099511627776 (0x10000000000)
    mem64_win_pciaddr = 36283883716608 (0x210000000000)
    io_win_addr = 35184372088832 (0x200000000000)
    io_win_size = 65536 (0x10000)
    dynamic-reconfiguration = true
    dma_win_addr = 0 (0x0)
    dma_win_size = 1073741824 (0x40000000)
    dma64_win_addr = 576460752303423488 (0x800000000000000)
    ddw = true
    pgsz = 69632 (0x11000)
    numa_node = 4294967295 (0xffffffff)
    pre-2.8-migration = false
    pcie-extended-configuration-space = true
    bus: pci.0
      type PCI
      dev: pci-bridge, id "pci_bridge"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 03.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000)
        bar 0: mem at 0xc0200000 [0xc02000ff]
        bus: pci_bridge
          type PCI
      dev: nec-usb-xhci, id ""
        msi = "auto"
        msix = "auto"
        superspeed-ports-first = true
        force-pcie-endcap = false
        intrs = 16 (0x10)
        slots = 64 (0x40)
        streams = true
        p2 = 4 (0x4)
        p3 = 4 (0x4)
        addr = 01.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class USB controller, addr 00:01.0, pci id 1033:0194 (sub 1af4:1100)
        bar 0: mem at 0xc0020000 [0xc0023fff]
        bus: usb-bus.0
          type usb-bus
          dev: usb-mouse, id ""
            usb_version = 2 (0x2)
            port = ""
            serial = ""
            full-path = true
            msos-desc = true
            addr 0.2, port 2, speed 480, name QEMU USB Mouse, attached
          dev: usb-kbd, id ""
            usb_version = 2 (0x2)
            display = ""
            port = ""
            serial = ""
            full-path = true
            msos-desc = true
            addr 0.1, port 1, speed 480, name QEMU USB Keyboard, attached
      dev: VGA, id ""
        vgamem_mb = 16 (0x10)
        mmio = true
        qemu-extended-regs = true
        addr = 00.0
        romfile = "vgabios-stdvga.bin"
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class VGA controller, addr 00:00.0, pci id 1234:1111 (sub 1af4:1100)
        bar 0: mem at 0x80000000 [0x80ffffff]
        bar 2: mem at 0xc0000000 [0xc0000fff]
        bar 6: mem at 0xc0010000 [0xc001ffff]
  dev: spapr-vio-bridge, id ""
    bus: spapr-vio
      type spapr-vio-bus
      dev: spapr-vscsi, id "v-scsi@71000003"
        reg = 1895825411 (0x71000003)
        irq = 4105 (0x1009)
        bus: v-scsi@71000003.0
          type SCSI
      dev: spapr-vlan, id "l-lan@71000002"
        reg = 1895825410 (0x71000002)
        mac = "52:54:00:12:34:56"
        vlan = 0
        netdev = "hub0port0"
        use-rx-buffer-pools = true
        irq = 4104 (0x1008)
      dev: spapr-nvram, id "nvram@71000001"
        reg = 1895825409 (0x71000001)
        drive = ""
        irq = 4099 (0x1003)
      dev: spapr-vty, id "vty@71000000"
        reg = 1895825408 (0x71000000)
        chardev = "serial0"
        irq = 4098 (0x1002)
  dev: spapr-rtc, id ""

Comment 2 xianwang 2017-11-01 02:46:19 UTC
This bug is same with https://bugzilla.redhat.com/show_bug.cgi?id=1435086,

Comment 4 Qunfang Zhang 2017-11-01 02:48:47 UTC
Xianxian, since huding who is x86 stable abi feature owner is on travel, pls verify x86 behaviour and adjust Hardware field,  thanks.

Comment 5 xianwang 2017-11-01 03:26:19 UTC
(In reply to Qunfang Zhang from comment #4)
> Xianxian, since huding who is x86 stable abi feature owner is on travel, pls
> verify x86 behaviour and adjust Hardware field,  thanks.

This bug also exists on x86_64, the test result is as below:

(src)
kernel-3.10.0-693.11.1.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.10.x86_64
seabios-bin-1.10.2-3.el7_4.1.noarch
(dst)
kernel-3.10.0-765.el7.x86_64
qemu-kvm-rhev-2.10.0-3.el7.x86_64
seabios-1.10.2-5.el7.x86_64

test steps and result are same with bug report

Comment 6 Dr. David Alan Gilbert 2017-11-03 10:26:19 UTC
It looks like we've lost some capabilities on the bridge;
on x86 using:
/usr/libexec/qemu-kvm -M pc-i440fx-rhel7.4.0,accel=kvm -device pci-bridge,id=b1,bus=pci.0,addr=08,chassis_nr=1 /home/vms/f23-serial.qcow2 -nographic

on 2.9.0 we're seeing lspci -vvn is seeing:

00:08.0 0604: 1b36:0001 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 0
        Region 0: Memory at fea71000 (64-bit, non-prefetchable) [size=256]
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000c000-0000cfff
        Memory behind bridge: fe800000-fe9fffff
        Prefetchable memory behind bridge: 00000000fe000000-00000000fe1fffff
        Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [4c] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [48] Slot ID: 0 slots, First+, chassis 01
        Capabilities: [40] Hot-plug capable
        Kernel modules: shpchp

and on 2.10.0-4 we're seeing:
00:08.0 0604: 1b36:0001 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Slot ID: 0 slots, First+, chassis 01
        Kernel modules: shpchp

so we've lost the Hot-plug capable and MSI: capability chunks and also lost the i/o and memory behind bridge

Comment 7 Dr. David Alan Gilbert 2017-11-03 10:44:37 UTC
info qtree differences:

7.4.0:
      dev: pci-bridge, id "b1"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 08.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:08.0, pci id 1b36:0001 (sub 0000:0000)
        bar 0: mem at 0xfea71000 [0xfea710ff]
        bus: b1
          type PCI

7.5.0:
      dev: pci-bridge, id "b1"
        chassis_nr = 1 (0x1)
        msi = "off"
        shpc = false
        addr = 08.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:08.0, pci id 1b36:0001 (sub 0000:0000)
        bus: b1
          type PCI

note shpc and msi differences.

Comment 8 Dr. David Alan Gilbert 2017-11-03 13:06:13 UTC
posted downstream fix:

machine compat: pci_bridge/shpc always enable

That's tested on x86 only.

Xianwang: Can you please test
  https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14456858
on x86 and power on a  bunch of machine types.

Comment 9 xianwang 2017-11-06 06:54:15 UTC
(In reply to Dr. David Alan Gilbert from comment #8)
> posted downstream fix:
> 
> machine compat: pci_bridge/shpc always enable
> 
> That's tested on x86 only.
> 
> Xianwang: Can you please test
>   https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14456858
> on x86 and power on a  bunch of machine types.

Hi, Dave,
I have tried the above version and the test result is pass both on x86_64 and powerpc(P8).

version is as following:
P8:
(src)
RHEL-7.4 Server ppc64le
kernel-3.10.0-693.11.1.el7.ppc64le
qemu-kvm-rhev-2.9.0-16.el7_4.10.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch
(dst)
RHEL-7.4 Server ppc64le
kernel-3.10.0-760.el7.ppc64le
qemu-kvm-rhev-2.10.0-4.el7.bz1508271a.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

src:
      dev: pci-bridge, id "pci_bridge"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 03.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000)
        bar 0: mem at 0xc0200000 [0xc02000ff]
        bus: pci_bridge
          type PCI


dst:
      dev: pci-bridge, id "pci_bridge"
        chassis_nr = 2 (0x2)
        msi = "auto"
        shpc = true
        addr = 03.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 0000:0000)
        bar 0: mem at 0xffffffffffffffff [0xfe]
        bus: pci_bridge
          type PCI



x86_64:
(src)
kernel-3.10.0-693.11.1.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.10.x86_64
seabios-bin-1.10.2-3.el7_4.1.noarch
(dst)
kernel-3.10.0-765.el7.x86_64
qemu-kvm-rhev-2.10.0-4.el7.bz1508271a.x86_64
seabios-1.10.2-5.el7.x86_64

src:
      dev: pci-bridge, id "pci_bridge"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 13.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:13.0, pci id 1b36:0001 (sub 0000:0000)
        bar 0: mem at 0xfea71000 [0xfea710ff]
        bus: pci_bridge
          type PCI
dst:
      dev: pci-bridge, id "pci_bridge"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 13.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:13.0, pci id 1b36:0001 (sub 0000:0000)
        bar 0: mem at 0xffffffffffffffff [0xfe]
        bus: pci_bridge
          type PCI


steps are same as bug report.

Comment 10 David Gibson 2017-11-08 06:02:12 UTC
Dave,

Ugh, so this change fixes this bug, but isn't really right for Power.

shpc enabled on bridges is wrong for pseries, and always has been.  pseries has its own hotplug mechanism and the standard one will never work.  At present the pseries one doesn't work under bridges either - we need some way to force the host bridge to act as hotplug handler rather than the immediate bridge.  We've got an existing bug for that (bz 1436549), I just haven't had time to look at it in any detail.

For pseries though, simply disabling shpc gives a better failure mode than having it enabled but not working (so I was glad it changed in 2.9, I hadn't realised it had gone back again).  So we need to find a better way of handling both upstream and down.

Comment 11 Miroslav Rezanina 2017-11-08 17:59:07 UTC
Fix included in qemu-kvm-rhev-2.10.0-5.el7

Comment 13 Dr. David Alan Gilbert 2017-11-09 21:01:53 UTC
(In reply to David Gibson from comment #10)
> Dave,
> 
> Ugh, so this change fixes this bug, but isn't really right for Power.
> 
> shpc enabled on bridges is wrong for pseries, and always has been.  pseries
> has its own hotplug mechanism and the standard one will never work.  At
> present the pseries one doesn't work under bridges either - we need some way
> to force the host bridge to act as hotplug handler rather than the immediate
> bridge.  We've got an existing bug for that (bz 1436549), I just haven't had
> time to look at it in any detail.
> 
> For pseries though, simply disabling shpc gives a better failure mode than
> having it enabled but not working (so I was glad it changed in 2.9, I hadn't
> realised it had gone back again).  So we need to find a better way of
> handling both upstream and down.

I don't understand hotplug stuff to suggest the right fix here - I was just following the upstream fix;  but turning it on/off on old machine types breaks migration;  so I suggest if you don't want it on Power then turn it off on new machine types;  check with Marcel who did 2fa3566

Comment 14 xianwang 2017-11-30 07:17:17 UTC
This bug is verified pass on qemu-kvm-rhev-2.10.0-9.el7.ppc64le.
version is as following:
P8:
(src)
RHEL-7.4 Server ppc64le
3.10.0-693.13.1.el7.ppc64le
qemu-kvm-rhev-2.9.0-16.el7_4.12.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch
(dst)
RHEL-7.5 Server ppc64le
3.10.0-768.el7.ppc64le
qemu-kvm-rhev-2.10.0-9.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

steps are same as bug report.
on src host:
(qemu) info qtree 
 dev: pci-bridge, id "pci_bridge"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 03.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000)
        bar 0: mem at 0xc0200000 [0xc02000ff]
        bus: pci_bridge
          type PCI
on dst host:
(qemu) info qtree
      dev: pci-bridge, id "pci_bridge"
        chassis_nr = 1 (0x1)
        msi = "auto"
        shpc = true
        addr = 03.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 0000:0000)
        bar 0: mem at 0xffffffffffffffff [0xfe]
        bus: pci_bridge
          type PCI


result:
migration completed and vm is running on dst host.
on src:
(qemu) info migrate
Migration status: completed
on dst:
(qemu) info status 
VM status: running

and the result of migration "dst->src" is pass, so,this bug is fixed.

Comment 17 errata-xmlrpc 2018-04-11 00:44:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.