Bug 2105231
| Summary: | [MT2910] XML error: Invalid value for attribute 'speed' in element 'link': '(null)'. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Yanghang Liu <yanghliu> | |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
| libvirt sub component: | CLI & API | QA Contact: | yalzhang <yalzhang> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | chayang, gveitmic, jdenemar, lmen, mprivozn, virt-maint, xuzhang, yalzhang, ymankad | |
| Version: | 9.1 | Keywords: | Triaged, Upstream, ZStream | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-8.5.0-2.el9 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2168116 2170235 (view as bug list) | Environment: | ||
| Last Closed: | 2022-11-15 10:04:39 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2170235 | |||
This is because when reading the PCI config of a device (e.g. "/sys/bus/pci/devices/0000:00:1c.0/config") libvirt finds a field that corresponds to the speed (which is effectively an enum) and then uses internal enum -> string conversion. And simply, our internals do not expect 32GT/s speed. Posted onto the list: https://listman.redhat.com/archives/libvir-list/2022-July/232737.html Merged upstream as:
commit d33c2a9e2f933b31f8e96e9938c237bdffe27f84
Author: Michal Prívozník <mprivozn>
AuthorDate: Fri Jul 8 14:29:32 2022 +0200
Commit: Michal Prívozník <mprivozn>
CommitDate: Tue Jul 12 09:07:45 2022 +0200
vircpi: Add PCIe 5.0 and 6.0 link speeds
The PCIe 5.0 and PCIe 6.0 standards define new link speeds:
32GT/s and 64GT/s, respectively. Update our internal enum to
include these new speeds. Otherwise we format incorrect XML:
<pci-express>
<link validity='cap' port='0' speed='(null)' width='16'/>
<link validity='sta' speed='16' width='16'/>
</pci-express>
Like all "good" specifications, these are also locked behind a
login portal. But we can look at pciutils' source code: [1] and
[2].
1: https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/ls-caps.c?id=caca31a0eea41c7b051705704c1158fddc02fbd2
2: https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/ls-caps.c?id=5bdf63b6b1bc35b59c4b3f47f7ca83ca1868155b
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2105231
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
v8.5.0-78-gd33c2a9e2f
All patches pushed after 8.5.0 upstream release need to be backported to make it into RHEL 9.1.0. To POST: https://gitlab.com/redhat/rhel/src/libvirt/-/merge_requests/35 https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1293360 Reproduce it on libvirt-8.5.0-1.el9.x86_64
# virsh nodedev-dumpxml pci_0000_17_00_0 | grep speed
<link validity='cap' port='0' speed='(null)' width='16'/>
<link validity='sta' speed='16' width='16'/>
Update libvirt to libvirt-8.5.0-2.el9.x86_64, restart all the active virt* services, check again, the bug is fixed.
# virsh nodedev-dumpxml pci_0000_17_00_0 | grep speed
<link validity='cap' port='0' speed='32' width='16'/>
<link validity='sta' speed='16' width='16'/>
verified per comment 6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8003 |
Description of problem: The "virsh nodedev-detach $pci_address" cmd can not bind MT2910's driver to vfio-pci with XML error Version-Release number of selected component (if applicable): 5.14.0-118.el9.x86_64 libvirt-8.5.0-1.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. run "virsh nodedev-detach $pci_address" cmd for binding MT2910's driver to vfio-pci # virsh nodedev-detach pci_0000_17_00_0 error: Failed to detach device pci_0000_17_00_0 error: XML error: Invalid value for attribute 'speed' in element 'link': '(null)'. 2. check MT2910's driver # readlink -f /sys/bus/pci/devices/0000\:17\:00.0/driver /sys/bus/pci/drivers/mlx5_core Actual results: The cmd throws XML error and MT2910's driver fail to be bound to vfio-pci Expected results: The cmd throws "Device pci_0000_17_00_0 detached" and the MT2910's driver is bound to vfio-pci successfully Additional info: (1) # lshw -c network -businfo Bus info Device Class Description ========================================================== pci@0000:17:00.0 enp23s0f0np0 network MT2910 Family [ConnectX-7] pci@0000:17:00.1 enp23s0f1np1 network MT2910 Family [ConnectX-7] # ethtool -i enp23s0f0np0 driver: mlx5_core version: 5.14.0-118.el9.x86_64 firmware-version: 28.98.2402 (MT_0000000841) expansion-rom-version: bus-info: 0000:17:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes (2) # virsh nodedev-dumpxml pci_0000_17_00_0 <device> <name>pci_0000_17_00_0</name> <path>/sys/devices/pci0000:16/0000:16:02.0/0000:17:00.0</path> <parent>pci_0000_16_02_0</parent> <driver> <name>mlx5_core</name> </driver> <capability type='pci'> <class>0x020000</class> <domain>0</domain> <bus>23</bus> <slot>0</slot> <function>0</function> <product id='0x1021'>MT2910 Family [ConnectX-7]</product> <vendor id='0x15b3'>Mellanox Technologies</vendor> <capability type='virt_functions' maxCount='8'/> <capability type='vpd'> <name>NVIDIA ConnectX-7 Ethernet adapter card, 200 GbE , Dual-port QSFP, PCIe 5.0 x16, Crypto and Secure Boot</name> <fields access='readonly'> <change_level>A1</change_level> <part_number>MCX713106AC-VEAT</part_number> <serial_number>MT2215X06621</serial_number> <vendor_field index='2'>MCX713106AC-VEAT</vendor_field> <vendor_field index='3'>a83d8eb352baec1180001070fda37ad0</vendor_field> <vendor_field index='A'>MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX713106A</vendor_field> <vendor_field index='0'>PCIeGen5 x16</vendor_field> <vendor_field index='U'>MT2215X06621MLNXS0D0F0</vendor_field> </fields> </capability> <iommuGroup number='20'> <address domain='0x0000' bus='0x17' slot='0x00' function='0x0'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='(null)' width='16'/> <--- The 'speed' attribute value is "null" here <link validity='sta' speed='16' width='16'/> </pci-express> </capability> </device> (3) # lspci -vvv -s 17:00.0 17:00.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7] Subsystem: Mellanox Technologies Device 0026 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 18 NUMA node: 0 IOMMU group: 20 Region 0: Memory at 9e000000 (64-bit, prefetchable) [size=32M] Expansion ROM at 9b800000 [size=1M] Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 32GT/s, Width x16, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 16GT/s (downgraded), Width x16 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn+ LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS- LnkCtl2: Target Link Speed: 32GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [48] Vital Product Data Product Name: NVIDIA ConnectX-7 Ethernet adapter card, 200 GbE , Dual-port QSFP, PCIe 5.0 x16, Crypto and Secure Boot Read-only fields: [PN] Part number: MCX713106AC-VEAT [EC] Engineering changes: A1 [V2] Vendor specific: MCX713106AC-VEAT [SN] Serial number: MT2215X06621 [V3] Vendor specific: a83d8eb352baec1180001070fda37ad0 [VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX713106A [V0] Vendor specific: PCIeGen5 x16 [VU] Vendor specific: MT2215X06621MLNXS0D0F0 [RV] Reserved: checksum good, 1 byte(s) reserved End Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+ AERCap: First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00 VF offset: 2, stride: 1, Device ID: 101e Supported Page Size: 000007ff, System Page Size: 00000001 Region 0: Memory at 00000000a0800000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [1c0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [230 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Kernel driver in use: mlx5_core Kernel modules: mlx5_core (4) The MT2910's driver can be bound to vfio-pci by the following cmd: # modprobe vfio-pci # echo 0000:17:00.0 > /sys/bus/pci/devices/0000\:17\:00.0/driver/unbind # echo "15b3 1021" > /sys/bus/pci/drivers/vfio-pci/new_id # echo "15b3 1021" > /sys/bus/pci/drivers/vfio-pci/remove_id # readlink -f /sys/bus/pci/devices/0000\:17\:00.0/driver /sys/bus/pci/drivers/vfio-pci (5) The MT2910's driver can be bound to vfio-pci by the following cmd: # dpdk-devbind.py --bind=vfio-pci 0000:17:00.0 # readlink -f /sys/bus/pci/devices/0000\:17\:00.0/driver /sys/bus/pci/drivers/vfio-pci