Bug 1508271
Summary: | Migration is failed from host RHEL7.4.z to host RHEL7.5 with "-machine pseries-rhel7.4.0 -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> |
Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.5 | CC: | chayang, dgibson, dgilbert, juzhang, knoel, lvivier, michen, mrezanin, peterx, quintela, qzhang, virt-maint, xianwang |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.10.0-5.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-11 00:44:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
xianwang
2017-11-01 02:43:07 UTC
This bug is same with https://bugzilla.redhat.com/show_bug.cgi?id=1435086, Xianxian, since huding who is x86 stable abi feature owner is on travel, pls verify x86 behaviour and adjust Hardware field, thanks. (In reply to Qunfang Zhang from comment #4) > Xianxian, since huding who is x86 stable abi feature owner is on travel, pls > verify x86 behaviour and adjust Hardware field, thanks. This bug also exists on x86_64, the test result is as below: (src) kernel-3.10.0-693.11.1.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.10.x86_64 seabios-bin-1.10.2-3.el7_4.1.noarch (dst) kernel-3.10.0-765.el7.x86_64 qemu-kvm-rhev-2.10.0-3.el7.x86_64 seabios-1.10.2-5.el7.x86_64 test steps and result are same with bug report It looks like we've lost some capabilities on the bridge; on x86 using: /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.4.0,accel=kvm -device pci-bridge,id=b1,bus=pci.0,addr=08,chassis_nr=1 /home/vms/f23-serial.qcow2 -nographic on 2.9.0 we're seeing lspci -vvn is seeing: 00:08.0 0604: 1b36:0001 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 0 Region 0: Memory at fea71000 (64-bit, non-prefetchable) [size=256] Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: fe800000-fe9fffff Prefetchable memory behind bridge: 00000000fe000000-00000000fe1fffff Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [4c] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [48] Slot ID: 0 slots, First+, chassis 01 Capabilities: [40] Hot-plug capable Kernel modules: shpchp and on 2.10.0-4 we're seeing: 00:08.0 0604: 1b36:0001 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Slot ID: 0 slots, First+, chassis 01 Kernel modules: shpchp so we've lost the Hot-plug capable and MSI: capability chunks and also lost the i/o and memory behind bridge info qtree differences: 7.4.0: dev: pci-bridge, id "b1" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 08.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:08.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xfea71000 [0xfea710ff] bus: b1 type PCI 7.5.0: dev: pci-bridge, id "b1" chassis_nr = 1 (0x1) msi = "off" shpc = false addr = 08.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:08.0, pci id 1b36:0001 (sub 0000:0000) bus: b1 type PCI note shpc and msi differences. posted downstream fix: machine compat: pci_bridge/shpc always enable That's tested on x86 only. Xianwang: Can you please test https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14456858 on x86 and power on a bunch of machine types. (In reply to Dr. David Alan Gilbert from comment #8) > posted downstream fix: > > machine compat: pci_bridge/shpc always enable > > That's tested on x86 only. > > Xianwang: Can you please test > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14456858 > on x86 and power on a bunch of machine types. Hi, Dave, I have tried the above version and the test result is pass both on x86_64 and powerpc(P8). version is as following: P8: (src) RHEL-7.4 Server ppc64le kernel-3.10.0-693.11.1.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7_4.10.ppc64le SLOF-20170303-4.git66d250e.el7.noarch (dst) RHEL-7.4 Server ppc64le kernel-3.10.0-760.el7.ppc64le qemu-kvm-rhev-2.10.0-4.el7.bz1508271a.ppc64le SLOF-20170724-2.git89f519f.el7.noarch src: dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000) bar 0: mem at 0xc0200000 [0xc02000ff] bus: pci_bridge type PCI dst: dev: pci-bridge, id "pci_bridge" chassis_nr = 2 (0x2) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xffffffffffffffff [0xfe] bus: pci_bridge type PCI x86_64: (src) kernel-3.10.0-693.11.1.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.10.x86_64 seabios-bin-1.10.2-3.el7_4.1.noarch (dst) kernel-3.10.0-765.el7.x86_64 qemu-kvm-rhev-2.10.0-4.el7.bz1508271a.x86_64 seabios-1.10.2-5.el7.x86_64 src: dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 13.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:13.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xfea71000 [0xfea710ff] bus: pci_bridge type PCI dst: dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 13.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:13.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xffffffffffffffff [0xfe] bus: pci_bridge type PCI steps are same as bug report. Dave, Ugh, so this change fixes this bug, but isn't really right for Power. shpc enabled on bridges is wrong for pseries, and always has been. pseries has its own hotplug mechanism and the standard one will never work. At present the pseries one doesn't work under bridges either - we need some way to force the host bridge to act as hotplug handler rather than the immediate bridge. We've got an existing bug for that (bz 1436549), I just haven't had time to look at it in any detail. For pseries though, simply disabling shpc gives a better failure mode than having it enabled but not working (so I was glad it changed in 2.9, I hadn't realised it had gone back again). So we need to find a better way of handling both upstream and down. Fix included in qemu-kvm-rhev-2.10.0-5.el7 (In reply to David Gibson from comment #10) > Dave, > > Ugh, so this change fixes this bug, but isn't really right for Power. > > shpc enabled on bridges is wrong for pseries, and always has been. pseries > has its own hotplug mechanism and the standard one will never work. At > present the pseries one doesn't work under bridges either - we need some way > to force the host bridge to act as hotplug handler rather than the immediate > bridge. We've got an existing bug for that (bz 1436549), I just haven't had > time to look at it in any detail. > > For pseries though, simply disabling shpc gives a better failure mode than > having it enabled but not working (so I was glad it changed in 2.9, I hadn't > realised it had gone back again). So we need to find a better way of > handling both upstream and down. I don't understand hotplug stuff to suggest the right fix here - I was just following the upstream fix; but turning it on/off on old machine types breaks migration; so I suggest if you don't want it on Power then turn it off on new machine types; check with Marcel who did 2fa3566 This bug is verified pass on qemu-kvm-rhev-2.10.0-9.el7.ppc64le. version is as following: P8: (src) RHEL-7.4 Server ppc64le 3.10.0-693.13.1.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7_4.12.ppc64le SLOF-20170303-4.git66d250e.el7.noarch (dst) RHEL-7.5 Server ppc64le 3.10.0-768.el7.ppc64le qemu-kvm-rhev-2.10.0-9.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch steps are same as bug report. on src host: (qemu) info qtree dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 2100:0000) bar 0: mem at 0xc0200000 [0xc02000ff] bus: pci_bridge type PCI on dst host: (qemu) info qtree dev: pci-bridge, id "pci_bridge" chassis_nr = 1 (0x1) msi = "auto" shpc = true addr = 03.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.0, pci id 1b36:0001 (sub 0000:0000) bar 0: mem at 0xffffffffffffffff [0xfe] bus: pci_bridge type PCI result: migration completed and vm is running on dst host. on src: (qemu) info migrate Migration status: completed on dst: (qemu) info status VM status: running and the result of migration "dst->src" is pass, so,this bug is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |