RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1875265 - [mlx4] IB port cannot be enabled after disable
Summary: [mlx4] IB port cannot be enabled after disable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: rdma-core
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.4
Assignee: Honggang LI
QA Contact: Zhang Yi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-03 07:31 UTC by Zhang Yi
Modified: 2021-05-18 14:44 UTC (History)
3 users (show)

Fixed In Version: rdma-core-32.0-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 14:44:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Zhang Yi 2020-09-03 07:31:58 UTC
Description of problem:


Version-Release number of selected component (if applicable):
infiniband-diags-29.0-3.el8.x86_64
4.18.0-234.el8

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
server: rdma-dev-10.lab.bos.redhat.com

[root@rdma-dev-10 ~]$ ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.42.5000
	Hardware version: 1
	Node GUID: 0xf4521403007be160
	System image GUID: 0xf4521403007be163
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 37
		LMC: 0
		SM lid: 13
		Capability mask: 0x0259486a
		Port GUID: 0xf4521403007be161
		Link layer: InfiniBand
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 2
		LMC: 0
		SM lid: 1
		Capability mask: 0x02594868
		Port GUID: 0xf4521403007be162
		Link layer: InfiniBand

[root@rdma-dev-10 ~]$ ibportstate 37 1 disable
Initial CA/RT PortInfo:
# Port info: Lid 37 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................37
SMLid:...........................13
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................10.0 Gbps
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............14.0625 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0
# MLNX ext Port info: Lid 37 port 1
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x01
LinkSpeedEnabled:................0x01
LinkSpeedActive:.................0x00
Disable may be irreversible

After PortInfo set:
# Port info: Lid 37 port 1
LinkState:.......................Active
PhysLinkState:...................LinkUp
Lid:.............................37
SMLid:...........................13
LMC:.............................0
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................Extended speed
LinkSpeedExtSupported:...........14.0625 Gbps
LinkSpeedExtEnabled:.............14.0625 Gbps
LinkSpeedExtActive:..............14.0625 Gbps
Mkey:............................<not displayed>
MkeyLeasePeriod:.................0
ProtectBits:.....................0

[root@rdma-dev-10 ~]$ ibportstate 37 1 enable
ibwarn: [17129] _do_madrpc: recv failed: Connection timed out
ibwarn: [17129] mad_rpc: _do_madrpc failed; dport (Lid 37)
ibportstate: iberror: failed: smp query nodeinfo failed

[root@rdma-dev-10 ~]$ ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.42.5000
	Hardware version: 1
	Node GUID: 0xf4521403007be160
	System image GUID: 0xf4521403007be163
	Port 1:
		State: Down
		Physical state: Disabled
		Rate: 10
		Base lid: 37
		LMC: 0
		SM lid: 13
		Capability mask: 0x0259486a
		Port GUID: 0xf4521403007be161
		Link layer: InfiniBand
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 2
		LMC: 0
		SM lid: 1
		Capability mask: 0x02594868
		Port GUID: 0xf4521403007be162
		Link layer: InfiniBand

Comment 1 Zhang Yi 2020-09-03 08:17:12 UTC
And I also found the second port cannot be directly disabled, I have to disable the first port, and the second port then can be disabled.

[root@rdma-dev-11 ~]$ ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.42.5000
	Hardware version: 1
	Node GUID: 0xf4521403007be0e0
	System image GUID: 0xf4521403007be0e3
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 38
		LMC: 0
		SM lid: 13
		Capability mask: 0x02594868
		Port GUID: 0xf4521403007be0e1
		Link layer: InfiniBand
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 3
		LMC: 0
		SM lid: 1
		Capability mask: 0x02594868
		Port GUID: 0xf4521403007be0e2
		Link layer: InfiniBand
[root@rdma-dev-11 ~]$ ibportstate 3 2 disable

Comment 2 Honggang LI 2020-09-03 09:00:57 UTC
It is duplicated of bz1500952 .

Comment 3 Honggang LI 2020-09-03 09:02:29 UTC
Alaa,
 What can we do for these mlx specific bugs? Thanks

Comment 4 Alaa Hleihel (NVIDIA Mellanox) 2020-09-03 10:44:02 UTC
I think it's now using the second port.
what if you also specify the port number ?

ibportstate 37 1 -P 1 enable

Comment 5 Alaa Hleihel (NVIDIA Mellanox) 2020-09-03 10:46:16 UTC
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #4)
> I think it's now using the second port.
> what if you also specify the port number ?
> 
> ibportstate 37 1 -P 1 enable

actually also specify the card:

ibportstate 31 1 -C mlx4_0 -P 1 enable

Comment 6 Zhang Yi 2020-09-03 14:10:13 UTC
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #5)
> (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #4)
> > I think it's now using the second port.
> > what if you also specify the port number ?
> > 
> > ibportstate 37 1 -P 1 enable
> 
> actually also specify the card:
> 
> ibportstate 31 1 -C mlx4_0 -P 1 enable

Yeah, this method works well.

I think it's better we can update the related Usage/Examples for ibportstate.

[root@rdma-dev-11 ~]$ ibportstate 

Usage: ibportstate [options] <dest dr_path|lid|guid> <portnum> [<op>]

Supported ops: enable, disable, on, off, reset, speed, espeed, fdr10,
	width, query, down, arm, active, vls, mtu, lid, smlid, lmc,
	mkey, mkeylease, mkeyprot


Options:
  --config, -z <config>   use config file, default: /etc/infiniband-diags/ibdiag.conf
  --Ca, -C <ca>           Ca name to use
  --Port, -P <port>       Ca port number to use
  --Direct, -D            use Direct address argument
  --Lid, -L               use LID address argument
  --Guid, -G              use GUID address argument
  --timeout, -t <ms>      timeout in ms
  --sm_port, -s <lid>     SM port lid
  --show_keys, -K         display security keys in output
  --m_key, -y <key>       M_Key to use in request
  --errors, -e            show send and receive errors
  --verbose, -v           increase verbosity level
  --debug, -d             raise debug level
  --help, -h              help message
  --version, -V           show version

Examples:
  ibportstate 3 1 disable			# by lid
  ibportstate -G 0x2C9000100D051 1 enable	# by guid
  ibportstate -D 0 1			# (query) by direct route
  ibportstate 3 1 reset			# by lid
  ibportstate 3 1 speed 1			# by lid
  ibportstate 3 1 width 1			# by lid
  ibportstate -D 0 1 lid 0x1234 arm		# by direct route

Comment 8 Honggang LI 2020-10-10 08:40:23 UTC
(In reply to Zhang Yi from comment #6)
> (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #5)
> > (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #4)
> > > I think it's now using the second port.
> > > what if you also specify the port number ?
> > > 
> > > ibportstate 37 1 -P 1 enable
> > 
> > actually also specify the card:
> > 
> > ibportstate 31 1 -C mlx4_0 -P 1 enable
> 
> Yeah, this method works well.
> 
> I think it's better we can update the related Usage/Examples for ibportstate.

https://github.com/linux-rdma/rdma-core/pull/847

Comment 15 Honggang LI 2020-11-25 00:11:35 UTC
PR merged into upstream:

https://github.com/linux-rdma/rdma-core/pull/868

Comment 23 errata-xmlrpc 2021-05-18 14:44:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RDMA stack bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1594


Note You need to log in before you can comment on or make changes to this bug.