Bug 1366953
Summary: | virtio balloon can not work with pci-bridge | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | mazhang <mazhang> | ||||
Component: | SLOF | Assignee: | Thomas Huth <thuth> | ||||
Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.3 | CC: | dgibson, dzheng, hannsj_uhl, knoel, michal.skrivanek, michen, mrezanin, qzhang, thuth, virt-maint, yduan, yhong | ||||
Target Milestone: | rc | ||||||
Target Release: | 7.4 | ||||||
Hardware: | ppc64le | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | SLOF-20161019 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-01 22:33:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1392055 | ||||||
Bug Blocks: | 1401400 | ||||||
Attachments: |
|
Description
mazhang
2016-08-15 06:34:45 UTC
Created attachment 1190796 [details]
Guest dmesg
Observation: It seems like it is not failing with a different PCI slot address. This is working for me: sudo /usr/libexec/qemu-kvm -enable-kvm -nographic -vga none -hda /path/to/image.qcow2 -m 8192 -smp 2 -device pci-bridge,bus=pci.0,id=bridge1,chassis_nr=1,addr=0x6 -device virtio-balloon-pci,id=balloon0,bus=bridge1,addr=2 But as soon as I change the "addr=2" to a higher value, it is failing. Also not sure if it is related, but for the working cases, "lspci -vv | grep Interrupt" shows "pin A routed to IRQ 18", while in the non-working cases, I get "pin A routed to IRQ 0" instead. I think the IRQ thing is almost certainly related to this. Even with the various virtual irq number mappings, I'm pretty sure IRQ 0 is not a plausible value for a PCI irq. My guess would be something is wrong with our irq swizzling across bridges. I'm pretty sure this is due to an error in the interrupt-map property in the device tree node for the PCI bridge. In other words, a SLOF bug, probably. # hexdump -C interrupt-map 00000000 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 01 |................| 00000010 1e 52 29 60 00 00 30 00 00 00 00 00 00 00 00 00 |.R)`..0.........| 00000020 00 00 00 00 00 00 18 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 02 1e 52 29 60 00 00 30 00 00 00 00 00 |.....R)`..0.....| 00000040 00 00 00 00 00 00 00 01 00 00 18 00 00 00 00 00 |................| 00000050 00 00 00 00 00 00 00 03 1e 52 29 60 00 00 30 00 |.........R)`..0.| 00000060 00 00 00 00 00 00 00 00 00 00 00 02 00 00 18 00 |................| 00000070 00 00 00 00 00 00 00 00 00 00 00 04 1e 52 29 60 |.............R)`| 00000080 00 00 30 00 00 00 00 00 00 00 00 00 00 00 00 03 |..0.............| 00000090 The normal PCI interrupt specified should be either 1, 2, 3 or 4, mapping to pins A, B, C and D. These are generally swizzled on bridges so that depending on which slot the device is in those pins are rotated to different pins on the bridge itself. However, this particular interrupt map appears to be mapping: 1->0, 2->1, 3->2, 4->3 PinA, the interrupt actually used by the balloon is mapped to the invalid irq specifier '0'. Thomas, can you look into this. Notes for interpreting the dump above: * 00 00 18 00 00 00 00 00 00 00 00 00 is the config address (1st reg entry) of the balloon device * 1e 52 29 60 is the phandle of the PCI host bridge device node (i.e. the parent of this bridge). I think you're right, this looks very suspicious. There is likely a bug in the pci bridge setup code of SLOF that prepares the interrupt-map property. I can make it work when I apply the following patch: diff --git a/board-qemu/slof/pci-interrupts.fs b/board-qemu/slof/pci-interrupts.fs --- a/board-qemu/slof/pci-interrupts.fs +++ b/board-qemu/slof/pci-interrupts.fs @@ -1,6 +1,7 @@ : pci-gen-irq-map-one ( prop-addr prop-len slot pin -- prop-addr prop-len ) 2dup + 4 mod ( prop-addr prop-len slot pin parentpin ) + dup 0= IF drop 4 THEN >r >r ( prop-addr prop-len slot R: swizzledpin pin ) \ Child slot# Not sure yet, whether this is the 100% correct solution, so I need to do some more tests with that before I can send a patch... Moving this BZ to 7.4 since it is IMHO not a blocker bug (you can put the balloon device to a different location on the bus to make it work again, i.e. there is a work-around). Michal, This bug will affect most guest PCI devices (emulated or VFIO) behind a PCI to PCI bridge, which I think will be automatically created if you have enough PCI devices attached. Under RHEL, is there anything the user can do to influence the order in which PCI devices are added to the guest, and therefore whether they'll be behind a P2P bridge or not? This is to assess whether this bug is urgent enough to push into 7.3 or not. (Note that I'm about 95% confident that bug 1370026 is a dupe of this one). (In reply to David Gibson from comment #9) > Michal, > > This bug will affect most guest PCI devices (emulated or VFIO) behind a PCI > to PCI bridge, which I think will be automatically created if you have > enough PCI devices attached. we don't add any bridge ourselves nor do we use so many devices so we should be good here I've now sent two patches to upstream which should fix this issue: http://patchwork.ozlabs.org/patch/667393/ http://patchwork.ozlabs.org/patch/667394/ My patches have now been accepted by upstream: https://github.com/aik/SLOF/commit/aede66f8ca321a1f553d1e3d3779eda9349586bc https://github.com/aik/SLOF/commit/6dca611ff3faa9fccf916ac238529fa0c5015cc4 Fixed by rebase The following is the step of verification: 1.Version: Host:3.10.0-623.el7.ppc64le Qemu:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848 SLOF:SLOF.noarch 20170303-1.git66d250e.el7 2.Steps to Verify: Same to the top Description 3.Actual results: (qemu) info balloon balloon: actual=8192 (qemu) balloon 4096 (qemu) info balloon balloon: actual=4096 (qemu) balloon 1024 (qemu) info balloon balloon: actual=1024 Virtio balloon work well. This bug is fixed, and change the status to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2093 |