RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2105231 - [MT2910] XML error: Invalid value for attribute 'speed' in element 'link': '(null)'.
Summary: [MT2910] XML error: Invalid value for attribute 'speed' in element 'link': '(...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 2170235
TreeView+ depends on / blocked
 
Reported: 2022-07-08 10:10 UTC by Yanghang Liu
Modified: 2023-02-15 23:14 UTC (History)
9 users (show)

Fixed In Version: libvirt-8.5.0-2.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2168116 2170235 (view as bug list)
Environment:
Last Closed: 2022-11-15 10:04:39 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-127294 0 None None None 2022-07-08 10:12:25 UTC
Red Hat Knowledge Base (Solution) 6996927 0 None None None 2023-02-07 04:40:02 UTC
Red Hat Product Errata RHSA-2022:8003 0 None None None 2022-11-15 10:05:06 UTC

Description Yanghang Liu 2022-07-08 10:10:36 UTC
Description of problem:
The "virsh nodedev-detach $pci_address" cmd can not bind MT2910's driver to vfio-pci with XML error



Version-Release number of selected component (if applicable):
5.14.0-118.el9.x86_64
libvirt-8.5.0-1.el9.x86_64


How reproducible:
100%

Steps to Reproduce:
1. run "virsh nodedev-detach $pci_address" cmd for binding MT2910's driver to vfio-pci


# virsh nodedev-detach pci_0000_17_00_0
error: Failed to detach device pci_0000_17_00_0
error: XML error: Invalid value for attribute 'speed' in element 'link': '(null)'.

2. check MT2910's driver

# readlink -f /sys/bus/pci/devices/0000\:17\:00.0/driver
/sys/bus/pci/drivers/mlx5_core





Actual results:
The cmd throws XML error and MT2910's driver fail to be bound to vfio-pci

Expected results:
The cmd throws "Device pci_0000_17_00_0 detached" and the MT2910's driver is bound to vfio-pci successfully


Additional info:
(1)

# lshw -c network -businfo
Bus info          Device        Class          Description
==========================================================
pci@0000:17:00.0  enp23s0f0np0  network        MT2910 Family [ConnectX-7]
pci@0000:17:00.1  enp23s0f1np1  network        MT2910 Family [ConnectX-7]



# ethtool -i enp23s0f0np0
driver: mlx5_core
version: 5.14.0-118.el9.x86_64
firmware-version: 28.98.2402 (MT_0000000841)
expansion-rom-version: 
bus-info: 0000:17:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

(2)
# virsh nodedev-dumpxml pci_0000_17_00_0
<device>
  <name>pci_0000_17_00_0</name>
  <path>/sys/devices/pci0000:16/0000:16:02.0/0000:17:00.0</path>
  <parent>pci_0000_16_02_0</parent>
  <driver>
    <name>mlx5_core</name>
  </driver>
  <capability type='pci'>
    <class>0x020000</class>
    <domain>0</domain>
    <bus>23</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1021'>MT2910 Family [ConnectX-7]</product>
    <vendor id='0x15b3'>Mellanox Technologies</vendor>
    <capability type='virt_functions' maxCount='8'/>
    <capability type='vpd'>
      <name>NVIDIA ConnectX-7 Ethernet adapter card, 200 GbE , Dual-port QSFP, PCIe 5.0 x16, Crypto and Secure Boot</name>
      <fields access='readonly'>
        <change_level>A1</change_level>
        <part_number>MCX713106AC-VEAT</part_number>
        <serial_number>MT2215X06621</serial_number>
        <vendor_field index='2'>MCX713106AC-VEAT</vendor_field>
        <vendor_field index='3'>a83d8eb352baec1180001070fda37ad0</vendor_field>
        <vendor_field index='A'>MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX713106A</vendor_field>
        <vendor_field index='0'>PCIeGen5 x16</vendor_field>
        <vendor_field index='U'>MT2215X06621MLNXS0D0F0</vendor_field>
      </fields>
    </capability>
    <iommuGroup number='20'>
      <address domain='0x0000' bus='0x17' slot='0x00' function='0x0'/>
    </iommuGroup>
    <numa node='0'/>
    <pci-express>
      <link validity='cap' port='0' speed='(null)' width='16'/>  <--- The  'speed' attribute value is "null" here
      <link validity='sta' speed='16' width='16'/>
    </pci-express>
  </capability>
</device>

(3)
# lspci -vvv -s 17:00.0
17:00.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
	Subsystem: Mellanox Technologies Device 0026
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 18
	NUMA node: 0
	IOMMU group: 20
	Region 0: Memory at 9e000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at 9b800000 [size=1M]
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
		DevCtl:	CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 512 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 32GT/s, Width x16, ASPM not supported
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s (downgraded), Width x16 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
			 10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn+
		LnkCap2: Supported Link Speeds: 2.5-32GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 32GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [48] Vital Product Data
		Product Name: NVIDIA ConnectX-7 Ethernet adapter card, 200 GbE , Dual-port QSFP, PCIe 5.0 x16, Crypto and Secure Boot                                                                               
		Read-only fields:
			[PN] Part number: MCX713106AC-VEAT         
			[EC] Engineering changes: A1
			[V2] Vendor specific: MCX713106AC-VEAT         
			[SN] Serial number: MT2215X06621   
			[V3] Vendor specific: a83d8eb352baec1180001070fda37ad0
			[VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX713106A      
			[V0] Vendor specific: PCIeGen5 x16 
			[VU] Vendor specific: MT2215X06621MLNXS0D0F0 
			[RV] Reserved: checksum good, 1 byte(s) reserved
		End
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
		AERCap:	First Error Pointer: 04, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
		VF offset: 2, stride: 1, Device ID: 101e
		Supported Page Size: 000007ff, System Page Size: 00000001
		Region 0: Memory at 00000000a0800000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Capabilities: [1c0 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [230 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core


(4) The MT2910's driver can be bound to vfio-pci by the following cmd: 
# modprobe vfio-pci
# echo 0000:17:00.0 > /sys/bus/pci/devices/0000\:17\:00.0/driver/unbind 
# echo "15b3 1021"  > /sys/bus/pci/drivers/vfio-pci/new_id 
# echo "15b3 1021"  > /sys/bus/pci/drivers/vfio-pci/remove_id  
# readlink -f /sys/bus/pci/devices/0000\:17\:00.0/driver
/sys/bus/pci/drivers/vfio-pci


(5) The MT2910's driver can be bound to vfio-pci by the following cmd:
# dpdk-devbind.py --bind=vfio-pci 0000:17:00.0
# readlink -f /sys/bus/pci/devices/0000\:17\:00.0/driver
/sys/bus/pci/drivers/vfio-pci

Comment 1 Michal Privoznik 2022-07-08 12:21:37 UTC
This is because when reading the PCI config of a device (e.g. "/sys/bus/pci/devices/0000:00:1c.0/config") libvirt finds a field that corresponds to the speed (which is effectively an enum) and then uses internal enum -> string conversion. And simply, our internals do not expect 32GT/s speed.

Comment 2 Michal Privoznik 2022-07-08 12:43:25 UTC
Posted onto the list:

https://listman.redhat.com/archives/libvir-list/2022-July/232737.html

Comment 3 Michal Privoznik 2022-07-12 07:13:06 UTC
Merged upstream as:

commit d33c2a9e2f933b31f8e96e9938c237bdffe27f84
Author:     Michal Prívozník <mprivozn>
AuthorDate: Fri Jul 8 14:29:32 2022 +0200
Commit:     Michal Prívozník <mprivozn>
CommitDate: Tue Jul 12 09:07:45 2022 +0200

    vircpi: Add PCIe 5.0 and 6.0 link speeds
    
    The PCIe 5.0 and PCIe 6.0 standards define new link speeds:
    32GT/s and 64GT/s, respectively. Update our internal enum to
    include these new speeds. Otherwise we format incorrect XML:
    
      <pci-express>
        <link validity='cap' port='0' speed='(null)' width='16'/>
        <link validity='sta' speed='16' width='16'/>
      </pci-express>
    
    Like all "good" specifications, these are also locked behind a
    login portal. But we can look at pciutils' source code: [1] and
    [2].
    
    1: https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/ls-caps.c?id=caca31a0eea41c7b051705704c1158fddc02fbd2
    2: https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/ls-caps.c?id=5bdf63b6b1bc35b59c4b3f47f7ca83ca1868155b
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2105231
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

v8.5.0-78-gd33c2a9e2f

Comment 4 Jiri Denemark 2022-07-14 14:11:26 UTC
All patches pushed after 8.5.0 upstream release need to be backported to make
it into RHEL 9.1.0.

Comment 6 yalzhang@redhat.com 2022-07-20 08:01:09 UTC
Reproduce it on libvirt-8.5.0-1.el9.x86_64
# virsh nodedev-dumpxml pci_0000_17_00_0 | grep speed
      <link validity='cap' port='0' speed='(null)' width='16'/>
      <link validity='sta' speed='16' width='16'/>

Update libvirt to libvirt-8.5.0-2.el9.x86_64, restart all the active virt* services, check again, the bug is fixed.
# virsh nodedev-dumpxml pci_0000_17_00_0 | grep speed
      <link validity='cap' port='0' speed='32' width='16'/>
      <link validity='sta' speed='16' width='16'/>

Comment 10 yalzhang@redhat.com 2022-07-27 02:14:16 UTC
verified per comment 6

Comment 12 errata-xmlrpc 2022-11-15 10:04:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003


Note You need to log in before you can comment on or make changes to this bug.