Bug 1346688
| Summary: | [Q35] vfio read-only SR-IOV capability confuses OVMF | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | jingzhao <jinzhao> | ||||||||||
| Component: | qemu-kvm-rhev | Assignee: | Alex Williamson <alex.williamson> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | jingzhao <jinzhao> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 7.3 | CC: | alex.williamson, chayang, jinzhao, juzhang, knoel, lersek, marcel, virt-maint | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | qemu-kvm-rhev-2.6.0-12.el7 | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2016-11-07 21:17:36 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
jingzhao
2016-06-15 08:05:32 UTC
Created attachment 1168238 [details]
ovmf log
Jing Zhao, two remarks / questions: - Your use of OVMF_VARS.fd is not correct. Please refer to bug 1308678 comment 23 bullet (1) for details. (This is independent from the functionality being tested -- it's a general remark, but I think it's worth pointing out.) - I checked the attached OVMF log file, from comment 1. It says PciHostBridgeGetRootBridges: 2 extra root buses reported by QEMU InitRootBridge: populated root bus 0, with room for 7 subordinate bus(es) InitRootBridge: populated root bus 8, with room for 11 subordinate bus(es) InitRootBridge: populated root bus 20, with room for 235 subordinate bus(es) However, this doesn't seem to match the command line given in comment 0 -- on that command line, you do not have any pxb-pcie devices. At the moment it appears to me that you tried VFIO device assignment in combination with pxb-pcie, and you ran into the bus_nr problem that we've been discussing in bug 1345738. Looking at the OVMF debug log, this is the impression I'm getting. And I think you ended up pasting a different QEMU command line (without pxb-pcie devices) into comment 0. Can you please clarify? Thanks. Jing Zhao, another remark: *assuming* that you intend to place (a) the assigned device, and/or (b) the *modern* virtio-blk device, behind a pxb-pcie extra root bridge, please be aware of bug 1323976. Namely, unlike SeaBIOS, the edk2 PCI infrastructure built into OVMF prefers to allocate 64-bit MMIO BARs of PCI devices outside of the 32-bit address space. This works fine if you place such PCI device directly on the "main" root bridge (bus_nr=0), but it can break if the PCI device with 64-bit MMIO BARs is elsewhere. The possible breakage is due to QEMU's ACPI generator producing incorrect resource descriptors --> see bug 1323976. This may affect both modern virtio devices, and assigned physical devices. There are two work-arounds: - Plug these devices directly into "pcie.0". - Alternatively, pass the following switch to QEMU, disabling the 64-bit MMIO aperture for OVMF: -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=0 Thanks. Sigh, I CC'd Marcel while writing my previous comment, but of course Bugzilla had to apply some automatic changes meanwhile, so my metadata changes went lost. Adding the CC now. (In reply to jingzhao from comment #0) > -device vfio-pci,host=0000:03:00.0,id=vfio,bus=downstream1.3 What is this device? Please always identify the device being assigned. Laszlo has additional questions in comment 3 that also needinfo regarding the consistency of the original report. Created attachment 1168522 [details]
ovmf debug log
Reproduced with 82576 PF:
# lspci -vs 7:00.0
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
Physical Slot: 5
Flags: fast devsel, IRQ 16
Memory at ef420000 (32-bit, non-prefetchable) [disabled] [size=128K]
Memory at ef000000 (32-bit, non-prefetchable) [disabled] [size=4M]
I/O ports at 8020 [disabled] [size=32]
Memory at ef4c4000 (32-bit, non-prefetchable) [disabled] [size=16K]
Expansion ROM at eec00000 [disabled] [size=4M]
OVMF log ends with:
PciHostBridge: SubmitResources for PciRoot(0x0)
I/O: Granularity/SpecificFlag = 0 / 01
Length/Alignment = 0x3000 / 0xFFF
Mem: Granularity/SpecificFlag = 32 / 00
Length/Alignment = 0x1E8200000 / 0xEF49FFFF
PciBus: HostBridge->SubmitResources() - Invalid Parameter
ASSERT_EFI_ERROR (Status = Invalid Parameter)
ASSERT /builddir/build/BUILD/ovmf-988715a/MdeModulePkg/Bus/Pci/PciBusDxe/PciLib.c(561): !EFI_ERROR (Status)
VM boots with SeaBIOS.
OVMF-20160608-1.git988715a.el7.noarch
qemu-kvm-rhev-2.6.0-6.el7.x86_64
Seems like an OVMF bug, reassigning The command works with either a non-SR-IOV capable device (82579LM) or a VF (82576), is it the SR-IOV enumeration that kills OVMF? Did I get lucky picking an SR-IOV PF on my first try? NB, vfio exposes the SR-IOV capability as read-only, in case that's confusing OVMF, but it would seem unusual for OVMF to blindly attempt to enable SR-IOV. Yes, if vfio hides the sr-iov capability on the device, OVMF boots. Thanks Laszlo Correct information of bug 1) the nic information which passthrough to the guest [root@hp-z800-01 home]# lspci -vvv -s 03:00.0 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter Physical Slot: 1 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 29 Region 0: Memory at e4800000 (32-bit, non-prefetchable) [disabled] [size=128K] Region 1: Memory at e4000000 (32-bit, non-prefetchable) [disabled] [size=4M] Region 2: I/O ports at c000 [disabled] [size=32] Region 3: Memory at e4840000 (32-bit, non-prefetchable) [disabled] [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable- Count=10 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-42-33-84 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00 VF offset: 128, stride: 2, Device ID: 10ca Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000e4848000 (64-bit, non-prefetchable) Region 3: Memory at 00000000e4868000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Kernel driver in use: vfio-pci 2. the test command line /usr/libexec/qemu-kvm \ -M q35 \ -cpu Nehalem \ -monitor stdio \ -m 4G \ -vga qxl \ -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/home/my_varstore.fd,if=pflash,format=raw,unit=1 \ -debugcon file:/home/q35.ovmf.log \ -global isa-debugcon.iobase=0x402 \ -spice port=5932,disable-ticketing \ -smp 4,sockets=4,cores=1,threads=1 \ -device ioh3420,bus=pcie.0,id=root1.0,slot=1 \ -device x3130-upstream,bus=root1.0,id=upstream1.1 \ -device xio3130-downstream,bus=upstream1.1,id=downstream1.1,chassis=2 \ -device virtio-net-pci,bus=downstream1.1,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10 \ -device ioh3420,bus=pcie.0,id=root1.1,slot=2 \ -device x3130-upstream,bus=root1.1,id=upstream1.2 \ -device xio3130-downstream,bus=upstream1.2,id=downstream1.2,chassis=3 \ -device xio3130-downstream,bus=upstream1.2,id=downstream1.3,chassis=4 \ -drive if=none,id=drive0,file=/home/pxb-ovmf.qcow2 \ -device virtio-blk-pci,drive=drive0,scsi=off,bus=downstream1.2,disable-legacy=on,disable-modern=off \ -device ioh3420,bus=pcie.0,id=root1.2,slot=3 \ -device vfio-pci,host=0000:03:00.0,id=vfio,bus=downstream1.3 3. Update the ovmf log of my testing Thanks Jing Zhao Created attachment 1168549 [details]
updated ovmf log
(In reply to Alex Williamson from comment #9) > Did I get lucky picking an SR-IOV PF on my first try? Yes. In both OVMF log files (comment 7, comment 13), I can see messages such as: > PciBus: Discovered PCI @ [07|00|00] > ARI: CapOffset = 0x150 > SR-IOV: SupportedPageSize = 0x553; SystemPageSize = 0x1; FirstVFOffset = > 0x180; InitialVFs = 0x8; ReservedBusNum = 0x2; CapOffset = 0x160 > BAR[0]: Type = Mem32; Alignment = 0x1FFFF; Length = > 0x20000; Offset = 0x10 > BAR[1]: Type = Mem32; Alignment = 0x3FFFFF; Length = > 0x400000; Offset = 0x14 > BAR[2]: Type = Io32; Alignment = 0x1F; Length = 0x20; Offset = > 0x18 > BAR[3]: Type = Mem32; Alignment = 0x3FFF; Length = 0x4000; > Offset = 0x1C > VFBAR[0]: Type = Mem64; Alignment = 0xEF49FFFF; Length = > 0xEF4A0000; Offset = 0x184 > VFBAR[2]: Type = Mem64; Alignment = 0xEF47FFFF; Length = > 0xEF480000; Offset = 0x190 (which is the very first time I see VFBARs in this section of the log). Then, when the collected resources are submitted to PciHostBridgeDxe, it blows up: > PciHostBridge: SubmitResources for PciRoot(0x0) > I/O: Granularity/SpecificFlag = 0 / 01 > Length/Alignment = 0x3000 / 0xFFF > Mem: Granularity/SpecificFlag = 32 / 00 > Length/Alignment = 0x1E8200000 / 0xEF49FFFF > PciBus: HostBridge->SubmitResources() - Invalid Parameter > > ASSERT_EFI_ERROR (Status = Invalid Parameter) > ASSERT > /builddir/build/BUILD/ovmf-988715a/MdeModulePkg/Bus/Pci/PciBusDxe/PciLib.c(561): > /EFI_ERROR (Status) The direct reason being that the Length field (which is the sum of MMIO resources for the bridge, 0x1E8200000) is greater than 4GB, but the resource type is 32-bit MMIO (Granularity=32). It seems that the Length and Alignment fields have special meanings for VF BARs (I skimmed the SR-IOV spec very-very superficially). ... Hm, I think I might even suspect what causes this. I believe it is <https://github.com/tianocore/edk2/commit/05070c1b471b0>. The 64-bit MMIO BAR is degraded to 32-bit if it is a VFBAR and the device has an option ROM. (See the DegradeResource() function in the linked patch.) I don't understand the reasoning behind this. I'll take the discussion to the upstream list. Meanwhile, Alex, Jing Zhao, can you please repeat your tests, with the small modification that the ROM BAR for the assigned device be turned off? -device vfio-pci,...,rombar=0 ^^^^^^^^ Thanks! Laszlo, note that 82576 has very modest MMIO requirements for the MMIO space, this is typically the "works anywhere" SR-IOV device because it requires <= 2MB of MMIO space, which is the minimum bridge granularity. If we're coming up with needing more than 4G, it's probably because the read-only SR-IOV capability is being misinterpreted, ie. no sanity checks on the sizing. Upstream thread: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13381 FWIW, I'm also open to the idea that QEMU hide the SR-IOV capability from the VM. We have no support for the VM enabling SR-IOV, so there's really no dependency on exposing this capability. Kernel-level vfio exposes the capability read-only to prevent users from creating VFs, but it's still up to the hypervisor whether to further expose such capabilities to the VM. I'll try to follow whatever you deem best -- I guess first we should hear back from the maintainers of PciBusDxe in edk2, about their goals with VF BARs to begin with. In edk2 there is a feature flag, defined in "MdeModulePkg/MdeModulePkg.dec": ## Indicates if the Single Root I/O virtualization is supported.<BR><BR> # TRUE - Single Root I/O virtualization is supported.<BR> # FALSE - Single Root I/O virtualization is not supported.<BR> # @Prompt Enable SRIOV support. gEfiMdeModulePkgTokenSpaceGuid.PcdSrIovSupport|TRUE|BOOLEAN|0x10000044 OVMF inherits the default value (TRUE) without overriding it, for the time being. (It can override it if we want it to.) I don't know precisely what this feature flag controls. But, if it controls the "creation of VFs", then I guess we should disable it? Based on your comment 18. (In reply to Laszlo Ersek from comment #14) > (In reply to Alex Williamson from comment #9) > > > Did I get lucky picking an SR-IOV PF on my first try? > > Yes. In both OVMF log files (comment 7, comment 13), I can see messages such > as: > > > PciBus: Discovered PCI @ [07|00|00] > > ARI: CapOffset = 0x150 > > SR-IOV: SupportedPageSize = 0x553; SystemPageSize = 0x1; FirstVFOffset = > > 0x180; InitialVFs = 0x8; ReservedBusNum = 0x2; CapOffset = 0x160 > > BAR[0]: Type = Mem32; Alignment = 0x1FFFF; Length = > > 0x20000; Offset = 0x10 > > BAR[1]: Type = Mem32; Alignment = 0x3FFFFF; Length = > > 0x400000; Offset = 0x14 > > BAR[2]: Type = Io32; Alignment = 0x1F; Length = 0x20; Offset = > > 0x18 > > BAR[3]: Type = Mem32; Alignment = 0x3FFF; Length = 0x4000; > > Offset = 0x1C > > VFBAR[0]: Type = Mem64; Alignment = 0xEF49FFFF; Length = > > 0xEF4A0000; Offset = 0x184 > > VFBAR[2]: Type = Mem64; Alignment = 0xEF47FFFF; Length = > > 0xEF480000; Offset = 0x190 > > (which is the very first time I see VFBARs in this section of the log). > Then, when the collected resources are submitted to PciHostBridgeDxe, it > blows up: > > > PciHostBridge: SubmitResources for PciRoot(0x0) > > I/O: Granularity/SpecificFlag = 0 / 01 > > Length/Alignment = 0x3000 / 0xFFF > > Mem: Granularity/SpecificFlag = 32 / 00 > > Length/Alignment = 0x1E8200000 / 0xEF49FFFF > > PciBus: HostBridge->SubmitResources() - Invalid Parameter > > > > ASSERT_EFI_ERROR (Status = Invalid Parameter) > > ASSERT > > /builddir/build/BUILD/ovmf-988715a/MdeModulePkg/Bus/Pci/PciBusDxe/PciLib.c(561): > > /EFI_ERROR (Status) > > The direct reason being that the Length field (which is the sum of MMIO > resources for the bridge, 0x1E8200000) is greater than 4GB, but the resource > type is 32-bit MMIO (Granularity=32). > > It seems that the Length and Alignment fields have special meanings for VF > BARs (I skimmed the SR-IOV spec very-very superficially). > > ... Hm, I think I might even suspect what causes this. I believe it is > <https://github.com/tianocore/edk2/commit/05070c1b471b0>. The 64-bit MMIO > BAR is degraded to 32-bit if it is a VFBAR and the device has an option ROM. > (See the DegradeResource() function in the linked patch.) > > I don't understand the reasoning behind this. I'll take the discussion to > the upstream list. > > Meanwhile, Alex, Jing Zhao, can you please repeat your tests, with the small > modification that the ROM BAR for the assigned device be turned off? > > -device vfio-pci,...,rombar=0 > ^^^^^^^^ > > Thanks! Repeat the test with above changed, and failed. please check the ovmf log of add rombar parameter Created attachment 1169645 [details]
the ovmf log of add rombar parameter
Thank you for checking; the symptoms in the log file are identical. So it looks like rombar=0 makes no difference, and I should instrument the edk2 code with debug messages and experiment a little with it. Upstream thread #2, from a different angle: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13437 (In reply to Laszlo Ersek from comment #24) > Upstream thread #2, from a different angle: > http://thread.gmane.org/gmane.comp.bios.edk2.devel/13437 Lazslo, some folks think that allowing the guest to enable sr-iov on an assigned device is not a completely insane thing to do, see http://www.spinics.net/lists/kvm/msg134370.html The more I think about it, the more I think vfio is asking ovmf to detect something non-standard per the spec. Sure, it might be robust to detect that the VFBARs aren't getting sized correctly, but is that something we can reasonably expect of guest software? We can have QEMU hide the SR-IOV capability, though this also gets a little ugly because extended capabilities always start at 0x100 in PCI config space and it's not feasible to relocated capabilities, which means we need to support stubbing that first entry to something a guest will traverse, but not recognize. If we hope that the guest follows the spec to the letter, we could use capability ID 0x0, except QEMU-pci uses this internally. ID 0xFFFF also has special meaning for root complex register block based capabilities, which might mean a guest would assume no capabilities at all. That leaves unlikely to be assigned values, like 0xFFFE. It's all generally unappealing, but we do some hiding of capabilities in the kernel too, so I'll look to see whether we have a better algorithm there. Another possibility is that we virtualize the VFBARs to allow them to be sized, but leave the rest of the capability read-only. I'll need to look through the spec to see if we have any leeway to do this. Sounds great to me, thanks for looking into this! (And yes, the blurb on the referenced patch set seems reasonable as well.0 Regarding any possible stubbing out for the SR-IOV capability: the CreatePciIoDevice() function in [MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumeratorSupport.c] has a section like this: > // > // Initialization for SR-IOV > // > > if (PcdGetBool (PcdSrIovSupport)) { > Status = LocatePciExpressCapabilityRegBlock ( > PciIoDevice, > EFI_PCIE_CAPABILITY_ID_SRIOV, > &PciIoDevice->SrIovCapabilityOffset, > NULL > ); > if (!EFI_ERROR (Status)) { If you can make that LocatePciExpressCapabilityRegBlock() function call to fail, then SR-IOV will not be used, I think. (Similarly to the effect of setting PcdSrIovSupport to FALSE in the OVMF platform description files.) The LocatePciExpressCapabilityRegBlock() function is at the end of "MdeModulePkg/Bus/Pci/PciBusDxe/PciCommand.c", and it seems to perform a "fairly standard" traversal of the PCI Express config space. AFAICT, LocatePciExpressCapabilityRegBlock() is called in three places in total, looking for: - EFI_PCIE_CAPABILITY_ID_ARI (0x0E) - EFI_PCIE_CAPABILITY_ID_SRIOV (0x10) - EFI_PCIE_CAPABILITY_ID_MRIOV (0x11) These macro definitions are in "MdePkg/Include/IndustryStandard/PciExpress21.h". Grepping header files for "EFI_PCIE_CAPABILITY_ID_", I find no other header files with definitions. And, all the definitions I find in this header, are: > #define EFI_PCIE_CAPABILITY_ID_SRIOV_CONTROL_ARI_HIERARCHY 0x10 > #define EFI_PCIE_CAPABILITY_ID_ARI 0x0E > #define EFI_PCIE_CAPABILITY_ID_ATS 0x0F > #define EFI_PCIE_CAPABILITY_ID_SRIOV 0x10 > #define EFI_PCIE_CAPABILITY_ID_MRIOV 0x11 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_CAPABILITIES 0x04 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_CONTROL 0x08 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_STATUS 0x0A > #define EFI_PCIE_CAPABILITY_ID_SRIOV_INITIALVFS 0x0C > #define EFI_PCIE_CAPABILITY_ID_SRIOV_TOTALVFS 0x0E > #define EFI_PCIE_CAPABILITY_ID_SRIOV_NUMVFS 0x10 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_FUNCTION_DEPENDENCY_LINK 0x12 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_FIRSTVF 0x14 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_VFSTRIDE 0x16 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_VFDEVICEID 0x1A > #define EFI_PCIE_CAPABILITY_ID_SRIOV_SUPPORTED_PAGE_SIZE 0x1C > #define EFI_PCIE_CAPABILITY_ID_SRIOV_SYSTEM_PAGE_SIZE 0x20 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_BAR0 0x24 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_BAR1 0x28 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_BAR2 0x2C > #define EFI_PCIE_CAPABILITY_ID_SRIOV_BAR3 0x30 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_BAR4 0x34 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_BAR5 0x38 > #define EFI_PCIE_CAPABILITY_ID_SRIOV_VF_MIGRATION_STATE 0x3C I think it might be worth a shot to try 0xFFFE or similar. ... But, given how the effect of that would be practically identical to setting PcdSrIovSupport to FALSE in OvmfPkg/*.dsc (for the time being anyway!), I don't see a problem with that either. I'm happy to try out both, assuming I get access to a machine with a suitable NIC. (Obviously for the stubbing, you would have to provide the QEMU patch :)) Thanks! ... Hm, sorry I didn't see your list posting at <http://thread.gmane.org/gmane.comp.bios.edk2.devel/13437/focus=13439>. If you prefer to research QEMU / kernel changes for this, I'm happy to follow your lead, and/or assist with it as much as I can. If you'd like to take this BZ even, I won't object, obviously :) Let's fix this in QEMU, directly exposing a read-only SR-IOV capability to the guest doesn't seem to have much merit or spec compliance. QEMU patch posted: http://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg05813.html (I wonder if we should default to hiding all extended capabilities and add them as we go, but I'll start here) I managed to reproduce this error, in the following environment:
- assigned device (I350-T2V2 (8086:1521) PF):
03:00.0 Ethernet controller:
Intel Corporation I350 Gigabit Network Connection (rev 01)
- domain XML snippet (Q35):
<controller type='pci' index='3' model='pcie-root-port'>
<model name='ioh3420'/>
<target chassis='3' port='0xe8'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0'/>
</controller>
<controller type='pci' index='4' model='pcie-switch-upstream-port'>
<model name='x3130-upstream'/>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</controller>
<controller type='pci' index='5' model='pcie-switch-downstream-port'>
<model name='xio3130-downstream'/>
<target chassis='5' port='0x0'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</controller>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</hostdev>
This generates a QEMU command line like
-device ioh3420,port=0xe8,chassis=3,id=pci.3,bus=pcie.0,addr=0x1d \
-device x3130-upstream,id=pci.4,bus=pci.3,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=5,id=pci.5,bus=pci.4,addr=0x0 \
-device vfio-pci,host=03:00.0,id=hostdev3,bus=pci.5,addr=0x0 \
- QEMU: upstream v2.6.0-1395-g40428fe
- host kernel: 4.4.5-200.fc22.x86_64
I confirm that toggling the ROM BAR on/off makes no difference, so the error
is not due to any 64->32 bit resource degradation like I initially
suspected.
I'll go ahead and test Alex's upstream patch (comment 28).
Yes, the patch works: > PciBus: Discovered PCI @ [03|00|00] > ARI: forwarding enabled for PPB[02:00:00] > ARI: CapOffset = 0x150 > BAR[0]: Type = Mem32; Alignment = 0xFFFFF; Length = 0x100000; Offset = 0x10 > BAR[3]: Type = Mem32; Alignment = 0x3FFF; Length = 0x4000; Offset = 0x1C > > [...] > > PciBus: Resource Map for Bridge [02|00|00] > Type = Mem32; Base = 0x99200000; Length = 0x200000; Alignment = 0xFFFFF > Base = 0x99200000; Length = 0x100000; Alignment = 0xFFFFF; Owner = PCI [03|00|00:10] > Base = 0x99300000; Length = 0x4000; Alignment = 0x3FFF; Owner = PCI [03|00|00:1C] The Windows Server 2012 R2 guest OS was also launched. The driver for the NIC was installed automatically. The device looks fine in Device Manager. I'm still having network connectivity problems though -- I'm debugging them. (Almost certainly an issue with the dnsmasq / iptables setup on my the gateway machine.) I'd like to have an all-positive result before replying on the list. Fix included in qemu-kvm-rhev-2.6.0-12.el7 Verified it
kernel-3.10.0-489.el7.x86_64
qemu-img-rhev-2.6.0-19.el7.x86_64
OVMF-20160608-3.git988715a.el7.noarch
Following is the verfied steps:
1. Boot guest with following cmd:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu Nehalem \
-monitor stdio \
-m 4G \
-vga qxl \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/home/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
-debugcon file:/home/q35.ovmf.log \
-global isa-debugcon.iobase=0x402 \
-spice port=5932,disable-ticketing \
-smp 4,sockets=4,cores=1,threads=1 \
-device ioh3420,bus=pcie.0,id=root1.0,slot=1 \
-device x3130-upstream,bus=root1.0,id=upstream1.1 \
-device xio3130-downstream,bus=upstream1.1,id=downstream1.1,chassis=2 \
-device virtio-net-pci,bus=downstream1.1,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10 \
-device ioh3420,bus=pcie.0,id=root1.1,slot=2 \
-device x3130-upstream,bus=root1.1,id=upstream1.2 \
-device xio3130-downstream,bus=upstream1.2,id=downstream1.2,chassis=3 \
-device xio3130-downstream,bus=upstream1.2,id=downstream1.3,chassis=4 \
-drive if=none,id=drive0,file=/home/pxb-ovmf.qcow2 \
-device virtio-blk-pci,drive=drive0,scsi=off,bus=downstream1.2,disable-legacy=on,disable-modern=off \
-device ioh3420,bus=pcie.0,id=root1.2,slot=3 \
-device vfio-pci,host=03:00.0,id=vf-00.0,bus=root1.2 \
2. guest can boot up successfully
3. check the nic which passthrough from host in the guest
[root@dhcp-66-145-44 ~]# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:02.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:03.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (rev 02)
02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
03:00.0 Ethernet controller: Red Hat, Inc Virtio network device (rev 01)
04:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (rev 02)
05:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
05:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
06:00.0 SCSI storage controller: Red Hat, Inc Virtio block device (rev 01)
08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
[root@dhcp-66-145-44 ~]# lspci -vvv -t
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. QXL paravirtual graphic card
+-02.0-[01-03]----00.0-[02-03]----00.0-[03]----00.0 Red Hat, Inc Virtio network device
+-03.0-[04-07]----00.0-[05-07]--+-00.0-[06]----00.0 Red Hat, Inc Virtio block device
| \-01.0-[07]--
+-04.0-[08]----00.0 Intel Corporation 82576 Gigabit Network Connection
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
[root@dhcp-66-145-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 00:1b:21:42:33:84 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0x98400000-9841ffff
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.66.145.44 netmask 255.255.252.0 broadcast 10.66.147.255
inet6 2620:52:0:4292:986a:6bff:fe6c:6d6e prefixlen 64 scopeid 0x0<global>
inet6 fe80::986a:6bff:fe6c:6d6e prefixlen 64 scopeid 0x20<link>
ether 9a:6a:6b:6c:6d:6e txqueuelen 1000 (Ethernet)
RX packets 788 bytes 57561 (56.2 KiB)
RX errors 0 dropped 6 overruns 0 frame 0
TX packets 150 bytes 20636 (20.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Thanks
Jing Zhao
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |