Bug 1458032
Summary: | [Intel 7.5 Bug] KVMGT: Bogus PCI BAR emulation | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Alex Williamson <alex.williamson> | ||||
Component: | kernel | Assignee: | Paul Lai (Intel) <plai> | ||||
kernel sub component: | KVM | QA Contact: | Guo, Zhiyi <zhguo> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | alex.williamson, changbin.du, chayang, gordon.jin, hang.yuan, jinzhao, juzhang, knoel, michen, plai, salmy, terrence.xu, virt-maint, weinan.z.li, xiaolin.zhang, xiong.y.zhang, zhguo, zhiyuan.lv | ||||
Version: | 7.4 | ||||||
Target Milestone: | rc | ||||||
Target Release: | 7.5 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-3.10.0-816.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1533634 (view as bug list) | Environment: | |||||
Last Closed: | 2018-04-10 20:37:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1459973, 1469590 | ||||||
Attachments: |
|
Description
Alex Williamson
2017-06-01 19:50:38 UTC
Moved to RHEL 7.5. Will fixed by below two: https://lists.freedesktop.org/archives/intel-gvt-dev/2017-August/001690.html https://lists.freedesktop.org/archives/intel-gvt-dev/2017-August/001691.html (In reply to Changbin Du from comment #3) > Will fixed by below two: > https://lists.freedesktop.org/archives/intel-gvt-dev/2017-August/001690.html > https://lists.freedesktop.org/archives/intel-gvt-dev/2017-August/001691.html I don't see that the BAR4 and BAR5 register implementations are fixed by either of these. Is there a third patch for that? (In reply to Alex Williamson from comment #4) > > I don't see that the BAR4 and BAR5 register implementations are fixed by > either of these. Is there a third patch for that? Yes, there is another fix under cooking. I found there are some problems of PCI BAR read/write emulation. Sorry for missed this one. (In reply to Changbin Du from comment #5) > (In reply to Alex Williamson from comment #4) > > > > I don't see that the BAR4 and BAR5 register implementations are fixed by > > either of these. Is there a third patch for that? > > Yes, there is another fix under cooking. I found there are some problems of > PCI BAR read/write emulation. Sorry for missed this one. Any status update on 3 patch? patch in drm-intel: drm/i915: Add interface to reserve fence registers for vGPU https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=969b0950a188750bd6ad12693fa3b6e8d63036fb (In reply to weinanli from comment #7) > patch in drm-intel: > drm/i915: Add interface to reserve fence registers for vGPU > https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel- > nightly&id=969b0950a188750bd6ad12693fa3b6e8d63036fb This appears entirely unrelated. How is it relevant to this bz? (In reply to Alex Williamson from comment #8) > (In reply to weinanli from comment #7) > > patch in drm-intel: > > drm/i915: Add interface to reserve fence registers for vGPU > > https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel- > > nightly&id=969b0950a188750bd6ad12693fa3b6e8d63036fb > > This appears entirely unrelated. How is it relevant to this bz? Sorry my bad, pls ignore it, this patch is for #1449711 patches in drm-intel https://cgit.freedesktop.org/drm-intel/ commit 5d5fe176155e6cfa4a53accb90e4010baa5266d0 drm/i915/kvmgt: Sanitize PCI bar emulation commit f090a00df9ecdab5d066b099c1797e0070e27a36 drm/i915/gvt: Add emulation for BAR2 (aperture) with normal file RW approach commit f1751362d6357a90bc6e53176cec715ff2dbed74 drm/i915/gvt: Fix incorrect PCI BARs reporting commit 02d578e5edd980eac3fbed15db4d9e5665f22089 drm/i915/gvt: Add support for PCIe extended configuration space Hello, We now have all the related patches upstreamed, anyone can help to close this issue or is there pending procedure? Bugzilla has start complaining to me for 'Outstanding Requests', but we Intel guys seems do not have permission to close it. Thanks. The patches have not even been posted for the RHEL kernel, moving back to ASSIGNED. This bug is tracking a RHEL bug, not upstream. In order to progress from ASSIGNED, *backports* of these patches need to be posted via the internal process. (In reply to Alex Williamson from comment #12) > The patches have not even been posted for the RHEL kernel, moving back to > ASSIGNED. This bug is tracking a RHEL bug, not upstream. In order to > progress from ASSIGNED, *backports* of these patches need to be posted via > the internal process. I see, Thanks for your reply. Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-816.el7 This issue has been fixed and cannot be reproduced at RHEL7.5 Alpha (with kernel 3.10.0-799.el7), so close it. (In reply to Terrence Xu from comment #19) > This issue has been fixed and cannot be reproduced at RHEL7.5 Alpha (with > kernel 3.10.0-799.el7), so close it. This fix will be captured in a RHEL errata when released publicly in 7.5. The RH tools will advance the bug state from here to GA. Test against kernel-3.10.0-823.el7.x86_64(host & guest) and qemu-kvm-rhev-2.10.0-12.el7.x86_64. Vgpu used:i915-GVTg_V5_4 For ba4 & bar 5: Inside guest: # lspci -vv -s 00:05.0 00:05.0 VGA compatible controller: Intel Corporation Iris Pro Graphics 580 (rev 09) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device 2064 Physical Slot: 5 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 24 Region 0: Memory at 140000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at 180000000 (64-bit, prefetchable) [size=1G] Expansion ROM at febf1000 [disabled] [size=2K] Capabilities: [40] Vendor Specific Information: Len=0c <?> Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee01000 Data: 4061 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915 Bar 4 & Bar 5 has been hidden now. For Bar2, try to enable x-no-mmap=on, qemu cli: /usr/libexec/qemu-kvm -name input-test -m 4G \ -cpu Broadwell,enforce \ -smp 2 \ -device VGA \ -netdev tap,id=idinWyYp,vhost=on -device e1000,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp \ -uuid 215e11b2-a869-41b5-91cd-6a32a907be7e \ -device ich9-usb-uhci6 \ -drive file=/home/V8_1.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-scsi-disk0 \ -qmp unix:/tmp/input-port,server,nowait \ -monitor stdio \ -vnc :0 \ -device usb-tablet \ -device vfio-pci,id=kvmgt,x-no-mmap=on,sysfsdev=/sys/bus/mdev/devices/31efe556-0821-460b-b1f8-ccce561b25ca \ Host will hang immediately with log: ... <3>[ 133.932958] ip_va=ffffa09e4843cfe0: <3>[ 133.936528] cccccccc cccccccc <3>[ 133.940137] cccccccc cccccccc <3>[ 133.943785] cccccccc cccccccc <3>[ 133.947528] cccccccc cccccccc <3>[ 133.951135] <6>[ 142.588262] [drm] GPU HANG: ecode 9:0:0xeada1d47, reason: Hang on rcs0, action: reset <6>[ 142.596774] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. <6>[ 142.606780] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel <6>[ 142.616602] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. <6>[ 142.627078] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. <6>[ 142.636855] [drm] GPU crash dump saved to /sys/class/drm/card0/error <5>[ 142.643943] i915 0000:00:02.0: Resetting rcs0 after gpu hang <3>[ 143.642590] ip_va=ffffa09e4843d1f8: cccccccc cccccccc cccccccc cccccccc FailQA per comment 21 We got the GPU hang issue as zhiyi said, but host didn't crash. It is the GPU hang issue which be exposed after the old issue (VM cannot boot up with "x-no-mmap=on" option). The old behavior is VM cannot boot up with "x-no-mmap=on" option, our bug fix patches have resolved this issue. So it is the new behavior. Talked with Zhiyi, we still use this bug with this title for track. Created attachment 1371987 [details]
host gpu hang dmesg log
As noted in bug 1533634: 1) BAR 4 & 5 are exposed as scratch registers, as evidenced by the lspci and setpci examples run from within the guest in the bug report. 2) BAR0 and BAR1 are aliases of each, which is improper implementation of a 64bit BAR through the vfio API. The proper implementation is that only vfio BAR region 0 should report a size, BAR region 1 should be zero sized. 3) read/write backing of regions is not implemented as evidenced by the failure of the VM to work properly with the QEMU vfio-pci device option x-no-mmap=on. Let's handle 3) in bug 1533634. For verification of this bug, comment 21 partially verifies 1). I would suggest also using setpci from within the guest on the vGPU as shown in the original example. We can verify 2) using gdb. Install qemu-kvm-rhev-debuginfo, create a vGPU and use gdb as follows: # gdb /usr/libexec/qemu-kvm ... (gdb) b vfio_bars_setup Breakpoint 1 at 0x2d316c: file /usr/include/bits/unistd.h, line 99. (gdb) run -m 1G -net none -monitor stdio --enable-kvm -serial none -vga none -nographic -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/7611722f-5296-40c0-a248-cb3da38fb7a5 -S <replace UUID with that of vGPU on test system> Starting program: /usr/libexec/qemu-kvm -m 1G -net none -monitor stdio --enable-kvm -serial none -vga none -nographic -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/7611722f-5296-40c0-a248-cb3da38fb7a5 -S ... Breakpoint 1, vfio_realize (pdev=0x555557e3a000, errp=0x7fffffffdc80) at /usr/src/debug/qemu-2.10.0/hw/vfio/pci.c:2818 2818 vfio_bars_setup(vdev); ... (gdb) p vdev->bars[0] $1 = {region = {vbasedev = 0x555557e3a8e0, fd_offset = 0, mem = 0x555556cf71d0, size = 16777216, flags = 3, nr_mmaps = 0, mmaps = 0x0, nr = 0 '\000'}, ioport = false, mem64 = false, quirks = { lh_first = 0x0}} (gdb) p vdev->bars[1] $2 = {region = {vbasedev = 0x555557e3a8e0, fd_offset = 1099511627776, mem = 0x0, size = 0, flags = 0, nr_mmaps = 0, mmaps = 0x0, nr = 1 '\001'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} (gdb) p vdev->bars[2] $3 = {region = {vbasedev = 0x555557e3a8e0, fd_offset = 2199023255552, mem = 0x555556cf73b0, size = 1073741824, flags = 15, nr_mmaps = 1, mmaps = 0x555556dccc60, nr = 2 '\002'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} (gdb) p vdev->bars[3] $4 = {region = {vbasedev = 0x555557e3a8e0, fd_offset = 3298534883328, mem = 0x0, size = 0, flags = 0, nr_mmaps = 0, mmaps = 0x0, nr = 3 '\003'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} (gdb) quit The key information here is the "size = " value. It should be non-zero for indexes 0 & 2 and zero for indexes 1 & 3, as shown above. (In reply to Alex Williamson from comment #25) > As noted in bug 1533634: > > 1) BAR 4 & 5 are exposed as scratch registers, as evidenced by the lspci and > setpci examples run from within the guest in the bug report. > > 2) BAR0 and BAR1 are aliases of each, which is improper implementation of a > 64bit BAR through the vfio API. The proper implementation is that only vfio > BAR region 0 should report a size, BAR region 1 should be zero sized. > > 3) read/write backing of regions is not implemented as evidenced by the > failure of the VM to work properly with the QEMU vfio-pci device option > x-no-mmap=on. > > > Let's handle 3) in bug 1533634. > > For verification of this bug, comment 21 partially verifies 1). I would > suggest also using setpci from within the guest on the vGPU as shown in the > original example. > > We can verify 2) using gdb. Install qemu-kvm-rhev-debuginfo, create a vGPU > and use gdb as follows: > <...> > The key information here is the "size = " value. It should be non-zero for > indexes 0 & 2 and zero for indexes 1 & 3, as shown above. So am I correct in assuming there are no coding changes expected for this BZ and we only need verification from QE? If yes, please change the status back to ON_QA. Yes Test on skylake host with gpu Iris Pro Graphics 580 Package used: 3.10.0-830.el7.x86_64(host & guest) qemu-kvm-rhev-2.10.0-17.el7.x86_64 Device info: # lspci -vv -s 00:05.0 00:05.0 VGA compatible controller: Intel Corporation Iris Pro Graphics 580 (rev 09) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device 2064 Physical Slot: 5 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 24 Region 0: Memory at 140000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at 180000000 (64-bit, prefetchable) [size=1G] Expansion ROM at febf1000 [disabled] [size=2K] Capabilities: [40] Vendor Specific Information: Len=0c <?> Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee00000 Data: 4041 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915 For 1) in comment 25, Read the value of bar 4: # setpci -s 5.0 BASE_ADDRESS_4 00000000 Try to write to it: # setpci -s 5.0 BASE_ADDRESS_4=ffffffff Check if bar4 is writable: # setpci -s 5.0 BASE_ADDRESS_4 00000000 This prove bar4 is not writable but readable Apply same test to bar5: # setpci -s 5.0 BASE_ADDRESS_5 00000000 # setpci -s 5.0 BASE_ADDRESS_5=ffffffff # setpci -s 5.0 BASE_ADDRESS_5 00000000 Bar5 is not writable but readable too For 2), follow the instructions provided by Alex: # gdb /usr/libexec/qemu-kvm (gdb) b vfio_bars_setup Breakpoint 1 at 0x2d319c: file /usr/include/bits/unistd.h, line 99. (gdb) run -m 1G -net none -monitor stdio --enable-kvm -serial none -vga none -nographic -device vfio-pci,sysfsdev=/sys/bus/mdev/devi ces/0042ed6e-cfd8-44bf-b825-ba22f1b1005f -S Starting program: /usr/libexec/qemu-kvm -m 1G -net none -monitor stdio --enable-kvm -serial none -vga none -nographic -device vfio-p ci,sysfsdev=/sys/bus/mdev/devices/0042ed6e-cfd8-44bf-b825-ba22f1b1005f -S Breakpoint 1, vfio_realize (pdev=0x555557e38000, errp=0x7fffffffdd40) at /usr/src/debug/qemu-2.10.0/hw/vfio/pci.c:2818 2818 vfio_bars_setup(vdev); (gdb) p vdev->bars[0] $1 = {region = {vbasedev = 0x555557e388e0, fd_offset = 0, mem = 0x555556d911d0, size = 16777216, flags = 3, nr_mmaps = 0, mmaps = 0x0, nr = 0 '\000'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} (gdb) p vdev->bars[1] $2 = {region = {vbasedev = 0x555557e388e0, fd_offset = 1099511627776, mem = 0x0, size = 0, flags = 0, nr_mmaps = 0, mmaps = 0x0, nr = 1 '\001'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} (gdb) p vdev->bars[2] $3 = {region = {vbasedev = 0x555557e388e0, fd_offset = 2199023255552, mem = 0x555556d913b0, size = 1073741824, flags = 15, nr_mmaps = 1, mmaps = 0x555556dc6c60, nr = 2 '\002'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} (gdb) p vdev->bars[3] $4 = {region = {vbasedev = 0x555557e388e0, fd_offset = 3298534883328, mem = 0x0, size = 0, flags = 0, nr_mmaps = 0, mmaps = 0x0, nr = 3 '\003'}, ioport = false, mem64 = false, quirks = {lh_first = 0x0}} *size* of bar0 is 16777216 *size* of bar1 is 0 *size* of bar2 is 1073741824 *size* of bar3 is 0 Match the expect result. Verified per comment 28 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1062 |