Bug 1508271
| Summary: | Migration is failed from host RHEL7.4.z to host RHEL7.5 with "-machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
| Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> |
| Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.5 | CC: | chayang, dgibson, dgilbert, juzhang, knoel, lvivier, michen, mrezanin, peterx, quintela, qzhang, virt-maint, xianwang |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.10.0-5.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-11 00:44:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This bug is same with https://bugzilla.redhat.com/show_bug.cgi?id=1435086, Xianxian, since huding who is x86 stable abi feature owner is on travel, pls verify x86 behaviour and adjust Hardware field, thanks. (In reply to Qunfang Zhang from comment #4) > Xianxian, since huding who is x86 stable abi feature owner is on travel, pls > verify x86 behaviour and adjust Hardware field, thanks. This bug also exists on x86_64, the test result is as below: (src) kernel-3.10.0-693.11.1.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.10.x86_64 seabios-bin-1.10.2-3.el7_4.1.noarch (dst) kernel-3.10.0-765.el7.x86_64 qemu-kvm-rhev-2.10.0-3.el7.x86_64 seabios-1.10.2-5.el7.x86_64 test steps and result are same with bug report It looks like we've lost some capabilities on the bridge;
on x86 using:
/usr/libexec/qemu-kvm -M pc-i440fx-rhel7.4.0,accel=kvm -device pci-bridge,id=b1,bus=pci.0,addr=08,chassis_nr=1 /home/vms/f23-serial.qcow2 -nographic
on 2.9.0 we're seeing lspci -vvn is seeing:
00:08.0 0604: 1b36:0001 (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 0
Region 0: Memory at fea71000 (64-bit, non-prefetchable) [size=256]
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: fe800000-fe9fffff
Prefetchable memory behind bridge: 00000000fe000000-00000000fe1fffff
Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [4c] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [48] Slot ID: 0 slots, First+, chassis 01
Capabilities: [40] Hot-plug capable
Kernel modules: shpchp
and on 2.10.0-4 we're seeing:
00:08.0 0604: 1b36:0001 (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Slot ID: 0 slots, First+, chassis 01
Kernel modules: shpchp
so we've lost the Hot-plug capable and MSI: capability chunks and also lost the i/o and memory behind bridge
info qtree differences:
7.4.0:
dev: pci-bridge, id "b1"
chassis_nr = 1 (0x1)
msi = "auto"
shpc = true
addr = 08.0
romfile = ""
rombar = 1 (0x1)
multifunction = false
command_serr_enable = true
x-pcie-lnksta-dllla = true
x-pcie-extcap-init = true
class PCI bridge, addr 00:08.0, pci id 1b36:0001 (sub 0000:0000)
bar 0: mem at 0xfea71000 [0xfea710ff]
bus: b1
type PCI
7.5.0:
dev: pci-bridge, id "b1"
chassis_nr = 1 (0x1)
msi = "off"
shpc = false
addr = 08.0
romfile = ""
rombar = 1 (0x1)
multifunction = false
command_serr_enable = true
x-pcie-lnksta-dllla = true
x-pcie-extcap-init = true
class PCI bridge, addr 00:08.0, pci id 1b36:0001 (sub 0000:0000)
bus: b1
type PCI
note shpc and msi differences.
posted downstream fix: machine compat: pci_bridge/shpc always enable That's tested on x86 only. Xianwang: Can you please test https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14456858 on x86 and power on a bunch of machine types. (In reply to Dr. David Alan Gilbert from comment #8) > posted downstream fix: > > machine compat: pci_bridge/shpc always enable > > That's tested on x86 only. > > Xianwang: Can you please test > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14456858 > on x86 and power on a bunch of machine types. Hi, Dave, I have tried the above version and the test result is pass both on x86_64 and powerpc(P8). version is as following: P8: (src) RHEL-7.4 Server ppc64le kernel-3.10.0-693.11.1.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7_4.10.ppc64le SLOF-20170303-4.git66d250e.el7.noarch (dst) RHEL-7.4 Server ppc64le kernel-3.10.0-760.el7.ppc64le qemu-kvm-rhev-2.10.0-4.el7.bz1508271a.ppc64le SLOF-20170724-2.git89f519f.el7.noarch src: dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000) bar 0: mem at 0xc0200000 [0xc02000ff] bus: pci_bridge type PCI dst: dev: pci-bridge, id "pci_bridge" chassis_nr = 2 (0x2) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xffffffffffffffff [0xfe] bus: pci_bridge type PCI x86_64: (src) kernel-3.10.0-693.11.1.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.10.x86_64 seabios-bin-1.10.2-3.el7_4.1.noarch (dst) kernel-3.10.0-765.el7.x86_64 qemu-kvm-rhev-2.10.0-4.el7.bz1508271a.x86_64 seabios-1.10.2-5.el7.x86_64 src: dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 13.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:13.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xfea71000 [0xfea710ff] bus: pci_bridge type PCI dst: dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 13.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:13.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xffffffffffffffff [0xfe] bus: pci_bridge type PCI steps are same as bug report. Dave, Ugh, so this change fixes this bug, but isn't really right for Power. shpc enabled on bridges is wrong for pseries, and always has been. pseries has its own hotplug mechanism and the standard one will never work. At present the pseries one doesn't work under bridges either - we need some way to force the host bridge to act as hotplug handler rather than the immediate bridge. We've got an existing bug for that (bz 1436549), I just haven't had time to look at it in any detail. For pseries though, simply disabling shpc gives a better failure mode than having it enabled but not working (so I was glad it changed in 2.9, I hadn't realised it had gone back again). So we need to find a better way of handling both upstream and down. Fix included in qemu-kvm-rhev-2.10.0-5.el7 (In reply to David Gibson from comment #10) > Dave, > > Ugh, so this change fixes this bug, but isn't really right for Power. > > shpc enabled on bridges is wrong for pseries, and always has been. pseries > has its own hotplug mechanism and the standard one will never work. At > present the pseries one doesn't work under bridges either - we need some way > to force the host bridge to act as hotplug handler rather than the immediate > bridge. We've got an existing bug for that (bz 1436549), I just haven't had > time to look at it in any detail. > > For pseries though, simply disabling shpc gives a better failure mode than > having it enabled but not working (so I was glad it changed in 2.9, I hadn't > realised it had gone back again). So we need to find a better way of > handling both upstream and down. I don't understand hotplug stuff to suggest the right fix here - I was just following the upstream fix; but turning it on/off on old machine types breaks migration; so I suggest if you don't want it on Power then turn it off on new machine types; check with Marcel who did 2fa3566 This bug is verified pass on qemu-kvm-rhev-2.10.0-9.el7.ppc64le.
version is as following:
P8:
(src)
RHEL-7.4 Server ppc64le
3.10.0-693.13.1.el7.ppc64le
qemu-kvm-rhev-2.9.0-16.el7_4.12.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch
(dst)
RHEL-7.5 Server ppc64le
3.10.0-768.el7.ppc64le
qemu-kvm-rhev-2.10.0-9.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
steps are same as bug report.
on src host:
(qemu) info qtree
dev: pci-bridge, id "pci_bridge"
chassis_nr = 1 (0x1)
msi = "auto"
shpc = true
addr = 03.0
romfile = ""
rombar = 1 (0x1)
multifunction = false
command_serr_enable = true
x-pcie-lnksta-dllla = true
x-pcie-extcap-init = true
class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000)
bar 0: mem at 0xc0200000 [0xc02000ff]
bus: pci_bridge
type PCI
on dst host:
(qemu) info qtree
dev: pci-bridge, id "pci_bridge"
chassis_nr = 1 (0x1)
msi = "auto"
shpc = true
addr = 03.0
romfile = ""
rombar = 1 (0x1)
multifunction = false
command_serr_enable = true
x-pcie-lnksta-dllla = true
x-pcie-extcap-init = true
class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 0000:0000)
bar 0: mem at 0xffffffffffffffff [0xfe]
bus: pci_bridge
type PCI
result:
migration completed and vm is running on dst host.
on src:
(qemu) info migrate
Migration status: completed
on dst:
(qemu) info status
VM status: running
and the result of migration "dst->src" is pass, so,this bug is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |
Description of problem: Do migration from RHEL7.4.Z to RHEL7.5, qemu cli including "-machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1", migration failed. Version-Release number of selected component (if applicable): (src) RHEL-7.4 Server ppc64le kernel-3.10.0-693.11.1.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7_4.10.ppc64le SLOF-20170303-4.git66d250e.el7.noarch (dst) RHEL-7.4 Server ppc64le kernel-3.10.0-760.el7.ppc64le qemu-kvm-rhev-2.10.0-3.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch How reproducible: 3/3 Steps to Reproduce: 1.Boot a guest in RHEL7.4.z host(src host) with qemu cli /usr/libexec/qemu-kvm -machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio 2.Boot a guest in RHEL7.5 host(dst host) with same qemu cli as src host appending "incoming" as following /usr/libexec/qemu-kvm -machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 -monitor stdio -incoming tcp:0:5801 3.Do migration from src to dst src: (qemu) migrate -d tcp:10.16.42.48:5801 Actual results: after step 4, check the status of migration src: (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off Migration status: completed total time: 165 milliseconds downtime: 44 milliseconds setup: 10 milliseconds transferred ram: 5964 kbytes throughput: 472.23 mbps remaining ram: 0 kbytes total ram: 540736 kbytes duplicate: 134102 pages skipped: 0 pages normal: 1194 pages normal bytes: 4776 kbytes dirty sync count: 3 dst: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x34 read: 4c device: 40 cmask: ff wmask: 0 w1cmask:0 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load pci_bridge:parent_obj qemu-kvm: error while loading state for instance 0x0 of device 'pci@800000020000000:03.0/pci_bridge' qemu-kvm: load of migration failed: Invalid argument Expected results: migration successes and guest works well. Additional info: Before migration, check the qtree in src host: (qemu) info qtree bus: main-system-bus type System dev: spapr-pci-host-bridge, id "" index = 0 (0x0) buid = 576460752840294400 (0x800000020000000) liobn = 2147483648 (0x80000000) liobn64 = 2147483649 (0x80000001) mem_win_addr = 35186519572480 (0x200080000000) mem_win_size = 2147483648 (0x80000000) mem64_win_addr = 36283883716608 (0x210000000000) mem64_win_size = 1099511627776 (0x10000000000) mem64_win_pciaddr = 36283883716608 (0x210000000000) io_win_addr = 35184372088832 (0x200000000000) io_win_size = 65536 (0x10000) dynamic-reconfiguration = true dma_win_addr = 0 (0x0) dma_win_size = 1073741824 (0x40000000) dma64_win_addr = 576460752303423488 (0x800000000000000) ddw = true pgsz = 69632 (0x11000) numa_node = 4294967295 (0xffffffff) pre-2.8-migration = false pcie-extended-configuration-space = true bus: pci.0 type PCI dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000) bar 0: mem at 0xc0200000 [0xc02000ff] bus: pci_bridge type PCI dev: nec-usb-xhci, id "" msi = "auto" msix = "auto" superspeed-ports-first = true force-pcie-endcap = false intrs = 16 (0x10) slots = 64 (0x40) streams = true p2 = 4 (0x4) p3 = 4 (0x4) addr = 01.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class USB controller, addr 00:01.0, pci id 1033:0194 (sub 1af4:1100) bar 0: mem at 0xc0020000 [0xc0023fff] bus: usb-bus.0 type usb-bus dev: usb-mouse, id "" usb_version = 2 (0x2) port = "" serial = "" full-path = true msos-desc = true addr 0.2, port 2, speed 480, name QEMU USB Mouse, attached dev: usb-kbd, id "" usb_version = 2 (0x2) display = "" port = "" serial = "" full-path = true msos-desc = true addr 0.1, port 1, speed 480, name QEMU USB Keyboard, attached dev: VGA, id "" vgamem_mb = 16 (0x10) mmio = true qemu-extended-regs = true addr = 00.0 romfile = "vgabios-stdvga.bin" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class VGA controller, addr 00:00.0, pci id 1234:1111 (sub 1af4:1100) bar 0: mem at 0x80000000 [0x80ffffff] bar 2: mem at 0xc0000000 [0xc0000fff] bar 6: mem at 0xc0010000 [0xc001ffff] dev: spapr-vio-bridge, id "" bus: spapr-vio type spapr-vio-bus dev: spapr-vscsi, id "v-scsi@71000003" reg = 1895825411 (0x71000003) irq = 4105 (0x1009) bus: v-scsi type SCSI dev: spapr-vlan, id "l-lan@71000002" reg = 1895825410 (0x71000002) mac = "52:54:00:12:34:56" vlan = 0 netdev = "hub0port0" use-rx-buffer-pools = true irq = 4104 (0x1008) dev: spapr-nvram, id "nvram@71000001" reg = 1895825409 (0x71000001) drive = "" irq = 4099 (0x1003) dev: spapr-vty, id "vty@71000000" reg = 1895825408 (0x71000000) chardev = "serial0" irq = 4098 (0x1002) dev: spapr-rtc, id ""