Bug 1410287
Summary: | [RFE] Support for PCIe devices on PAPR (POWER) guests | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | David Gibson <dgibson> |
Component: | libvirt | Assignee: | Andrea Bolognani <abologna> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | bugproxy, dzheng, gsun, hannsj_uhl, jdenemar, jsuchane, qzhang, rbalakri |
Target Milestone: | rc | Keywords: | FutureFeature, TestOnly |
Target Release: | 7.4 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 07:44:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1410284, 1429760, 1568219 | ||
Bug Blocks: | 1264935, 1359843 |
Description
David Gibson
2017-01-05 01:34:40 UTC
It was agreed that resolving Bug 1280542 is not necessary for moving forward with this, hence dropping the dependency. I've tested this both with an emulated PCIe Ethernet adapter (e1000e) device and with an assigned host PCIe Ethernet adapter, and in both cases the guest was able to access the extended config space (capabilities >100). Note that this requires QEMU 2.9 and the use of the pseries-2.9 machine type. In fact, I've also verified that, as expected, guests running on older machine types can't access the extended config space. Case 1: Cold plug multiple host pcie devices to the guest. # lspci -vv|grep PCIe ... 0003:09:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Product Name: PCIe2 4-port 1GbE Adapter 0003:09:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Product Name: PCIe2 4-port 1GbE Adapter 0003:09:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Product Name: PCIe2 4-port 1GbE Adapter 0003:09:00.3 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Product Name: PCIe2 4-port 1GbE Adapter Configure the guest with 4 <hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/> </source> </hostdev> ... Start guest successfully. Dumpxml guest <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0003' bus='0x09' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </hostdev> Also see <address domain='0x0003' bus='0x09' slot='0x00' function='0x08'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x09'/> <address domain='0x0003' bus='0x09' slot='0x00' function='0x0a'/> Check within the guest [root@localhost ~]# lspci 00:01.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) ... 00:08.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 00:09.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 00:0a.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Reboot the VM and devices can be seen in lspci. Destroy the VM, the host devices are back to the host. Other three are same as below. # virsh nodedev-dumpxml pci_0003_09_00_0 <device> <name>pci_0003_09_00_0</name> <path>/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0</path> <parent>pci_0003_02_09_0</parent> <driver> <name>tg3</name> </driver> Case 2: Hotplug a host PCIe device to the guest. 1. Unbind other devices in same iommu group from the host. # virsh nodedev-detach pci_0003_09_00_1 Device pci_0003_09_00_1 detached # virsh nodedev-reset pci_0003_09_00_1 Device pci_0003_09_00_1 reset Same to pci_0003_09_00_0, pci_0003_09_00_2 2. Hot plug pci_0003_09_00_3 to the guest and lspci can list the device in guest. # virsh attach-device vm1 device_hostdev.xml Device attached successfully 3. Can not detach the device as bug 1272300. Test packages: libvirt-3.2.0-4.el7.ppc64le qemu-kvm-rhev-2.9.0-2.el7.ppc64le kernel-3.10.0-657.el7.ppc64le Above two cases are passed. (In reply to Dan Zheng from comment #3) [...] > Check within the guest > > [root@localhost ~]# lspci > 00:01.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit > Ethernet PCIe (rev 01) > ... > 00:08.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit > Ethernet PCIe (rev 01) > 00:09.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit > Ethernet PCIe (rev 01) > 00:0a.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit > Ethernet PCIe (rev 01) It's not enough to check whether devices are visible in the guest: for the purpose of this bug, it's critical that the PCIe config space is also exposed. You can make sure it is by looking for PCI capabilities >=100, eg. $ sudo lspci -vvs 0003:09:00.0 0003:09:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Subsystem: IBM Device 0420 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 508 NUMA node: 1 Region 0: Memory at 250100000000 (64-bit, prefetchable) [size=64K] Region 2: Memory at 250100010000 (64-bit, prefetchable) [size=64K] Region 4: Memory at 250100020000 (64-bit, prefetchable) [size=64K] [virtual] Expansion ROM at 3fe281000000 [disabled] [size=512K] Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- [...] Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+ Capabilities: [13c v1] Device Serial Number 00-00-98-be-94-04-14-04 Capabilities: [150 v1] Power Budgeting <?> Capabilities: [160 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [230 v1] Transaction Processing Hints Interrupt vector mode supported Steering table in MSI-X table Kernel driver in use: tg3 Kernel modules: tg3 > Case 2: Hotplug a host PCIe device to the guest. > 1. Unbind other devices in same iommu group from the host. > # virsh nodedev-detach pci_0003_09_00_1 > Device pci_0003_09_00_1 detached > > # virsh nodedev-reset pci_0003_09_00_1 > Device pci_0003_09_00_1 reset Not sure the reset is really needed, but I don't see how it could hurt either :) > Same to pci_0003_09_00_0, pci_0003_09_00_2 > > > 2. Hot plug pci_0003_09_00_3 to the guest and lspci can list the device in > guest. > # virsh attach-device vm1 device_hostdev.xml > Device attached successfully > > 3. Can not detach the device as bug 1272300. Since all devices in the IOMMU group have been detached from the host, you should be able to detach the device from the guest safely despite bug 1272300. For completeness' sake, it would be useful to make sure the extended config space is not exposed to guests when using pseries-rhel7.3.0 or older. Thanks Andrea for pointing it out. Retest Case 2. 1. Detach pci_0003_09_00_0 ~ pci_0003_09_00_2 from the host. 2. Start guest and attach pci_0003_09_00_3 to the guest. It is ok. 3. Check the configuration space and can see Capabilities > 100 in guest # lspci 00:01.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) # lspci -vvs 00:01.0 00:01.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Subsystem: IBM Device 0420 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 17 Region 0: Memory at 210000020000 (64-bit, prefetchable) [size=64K] Region 2: Memory at 210000030000 (64-bit, prefetchable) [size=64K] Region 4: Memory at 210000040000 (64-bit, prefetchable) [size=64K] [virtual] Expansion ROM at 200081000000 [disabled] [size=512K] Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Product Name: PCIe2 4-port 1GbE Adapter Read-only fields: [FN] Unknown: 30 30 45 32 38 37 33 [EC] Engineering changes: D77470 [CC] Unknown: 35 37 36 46 [PN] Part number: 00E2872 [FC] Unknown: 35 38 39 39 [SN] Serial number: YL50203CD12T [MN] Manufacture ID: 36 43 41 45 38 42 36 41 38 32 30 34 [RV] Reserved: checksum good, 83 byte(s) reserved Read/write fields: [YB] System specific: OFMENA\x02\x04\x00\x00\x00\x00\x00\x00\x00\x01\x00\x01\x00\x01\x00\x01\x00\x01\x00\x01\x00\x02\x00\x03\x00\x01\x00\x01\x00\x04\x00\x03\x00\x01\x00\x01\x00\x08 [RW] Read-write area: 137 byte(s) free End Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [a0] MSI-X: Enable+ Count=17 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00001000 Capabilities: [ac] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- **************** Extended Configuration Space ************************* Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+ Capabilities: [13c v1] Device Serial Number 00-00-6c-ae-8b-6a-82-07 Capabilities: [150 v1] Power Budgeting <?> Capabilities: [160 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Kernel driver in use: tg3 Kernel modules: tg3 3. Yesterday I happened to some abnormal problems like 'host no response' when I did detach the PCIe device. But I tried attach/detach again today for several times. it works as expected now. 4. Configure the guest with machine type **pseries-rhel7.3.0 ** and start guest 5. Repeat attach and check the capabilities in guest. Capabilities: [ac] is found, guest can not access Extended Configuration space (>=100) # lspci -vvs 00:01.0 00:01.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Subsystem: IBM Device 0420 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 17 Region 0: Memory at 10120020000 (64-bit, prefetchable) [size=64K] Region 2: Memory at 10120030000 (64-bit, prefetchable) [size=64K] Region 4: Memory at 10120040000 (64-bit, prefetchable) [size=64K] [virtual] Expansion ROM at 100a1000000 [disabled] [size=512K] Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Product Name: PCIe2 4-port 1GbE Adapter Read-only fields: [FN] Unknown: 30 30 45 32 38 37 33 [EC] Engineering changes: D77470 [CC] Unknown: 35 37 36 46 [PN] Part number: 00E2872 [FC] Unknown: 35 38 39 39 [SN] Serial number: YL50203CD12T [MN] Manufacture ID: 36 43 41 45 38 42 36 41 38 32 30 34 [RV] Reserved: checksum good, 83 byte(s) reserved Read/write fields: [YB] System specific: OFMENA\x02\x04\x00\x00\x00\x00\x00\x00\x00\x01\x00\x01\x00\x01\x00\x01\x00\x01\x00\x01\x00\x02\x00\x03\x00\x01\x00\x01\x00\x04\x00\x03\x00\x01\x00\x01\x00\x08 [RW] Read-write area: 137 byte(s) free End Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [a0] MSI-X: Enable+ Count=17 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00001000 Capabilities: [ac] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Kernel driver in use: tg3 Kernel modules: tg3 Based on above test, I mark it verified. |