RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1550471 - [RHEL-7.5] sos only collect info from the first InfiniBand HCA port
Summary: [RHEL-7.5] sos only collect info from the first InfiniBand HCA port
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.5
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: zguo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-01 09:17 UTC by Honggang LI
Modified: 2018-10-30 10:33 UTC (History)
12 users (show)

Fixed In Version: sos-3.6-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-30 10:31:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Loop all active IB ports (2.22 KB, patch)
2018-04-03 09:22 UTC, Honggang LI
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Github sosreport sos pull 1262 0 None None None 2018-04-03 13:22:00 UTC
Red Hat Product Errata RHEA-2018:3144 0 None None None 2018-10-30 10:33:21 UTC

Description Honggang LI 2018-03-01 09:17:25 UTC
Description of problem:

https://bugzilla.redhat.com/show_bug.cgi?id=1532638

https://bugzilla.redhat.com/attachment.cgi?id=1400940

When I was working on this ISER bug reported by NetApp, I found sos_commands/infiniband/* only collect information from the first HCA port.

No information collected for rest ports of the first HCA, and the second HCA also had been ignored.

The ISER clients has two single port HCAs, both HCAs are used. But I can't get any information for the second HCA. According to the iscsi/scsi commands, the second HCA had been connect to some NetApp device as the first HCA. But I can't get any useful information as the first HCA, because no information had been collected for the second HCA.

As dual port HCAs are popular and it is normal to have multiple HCAs installed on the same machine, we need sos collect information for all HCA ports. (Some iwarp cards has four ports.)

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1.run sosreport on machine has more than one HCA port
2.check sos_commands/infiniband/*
3.

Actual results:
only info had been collected for the first port

Expected results:
info for all HCA ports collected.

Additional info:

Comment 3 Bryn M. Reeves 2018-03-01 09:22:46 UTC
We don't do anything special at all in sos to interrogate particular HCAs or ports. We just run the vanilla infiniband commands:

            "ibv_devices",
            "ibv_devinfo",
            "ibstat",
            "ibstatus",
            "ibhosts",
            "iblinkinfo",
            "sminfo",
            "perfquery"

It's not clear from the information you have provided which command you are referring to.

If additional options or arguments are required to fetch information for all HCAs and ports then we can add those.

On the other hand, if the above commands are expected to get information for all devices (which seems reasonable since no device was specified) then this is looking more like a bug in the infiniband tools.

Comment 4 Honggang LI 2018-03-01 09:51:06 UTC
(In reply to Bryn M. Reeves from comment #3)

>             "ibv_devices",
>             "ibv_devinfo",
>             "ibstat",
>             "ibstatus",

Those four commands report information about *ALL* HCA ports for all HCA. But I could suggest run "ibv_devices -v" replace "ibv_devices".

>             "ibhosts",
>             "iblinkinfo",
>             "sminfo",
>             "perfquery"


Those four commands only collect information from the first port.

> It's not clear from the information you have provided which command you are
> referring to.
> 
> If additional options or arguments are required to fetch information for all
> HCAs and ports then we can add those.
> 
> On the other hand, if the above commands are expected to get information for
> all devices (which seems reasonable since no device was specified) then this
> is looking more like a bug in the infiniband tools.

Yes, I agree that. We need add some switches to such infiniband tools to collect info for all ports.

BTW, I can't find any infiniband-diags commands in the sosrport file. Those information is _VERY_ useful to get topology information of the fabric. Is it possible to add such tools?

Comment 5 Bryn M. Reeves 2018-03-01 10:15:49 UTC
> run "ibv_devices -v" replace "ibv_devices".
> infiniband-diags commands

No problem: these are both trivial changes to the plugin that we can include in 3.6 (due in rhel-7.6).

We'll hold off on other changes for now until the question over whether to do this in the IB tools or sos is decided.

Comment 6 Pavel Moravec 2018-03-01 10:41:26 UTC
(In reply to Bryn M. Reeves from comment #5)
> > run "ibv_devices -v" replace "ibv_devices".
> > infiniband-diags commands
> 
> No problem: these are both trivial changes to the plugin that we can include
> in 3.6 (due in rhel-7.6).
> 
> We'll hold off on other changes for now until the question over whether to
> do this in the IB tools or sos is decided.

Raising relevant needinfo:

Waiting to be specified what changes are required from sosreport side, i.e. what particular commands sosreport shall newly collect (or what cmds shall it update, like the "ibv_devices -v").

Comment 7 Pavel Moravec 2018-04-02 12:18:23 UTC
bouncing the needinfo - much welcomed if the BZ shall go to 7.6.

Comment 8 Honggang LI 2018-04-03 02:19:57 UTC
Well, bit by this issue again in yesterday when NetApp file other mutlipath related bug.

>             "ibhosts",
>             "iblinkinfo",
>             "sminfo",
>             "perfquery"

We have two options to scan all active ports.

Option A: Modify all of those four C programs. Use "umad_get_cas_names"/"umad_get_ca_portguids"/"umad_get_port" to get all active ports. And then iterate over all active ports.

Option B: create a python wrap to get all active ports. And then run those commands iterate over all active ports.

iblinkinfo -C <CA> -P <Port NUM>
ibhosts    -C <CA> -P <Port NUM>
sminfo     -C <CA> -P <Port NUM>
perfquery  -C <CA> -P <Port NUM>

It is easy to get the CA list, just read the dir of "/sys/class/infiniband".

[root@rdma-master ~]$ ls /sys/class/infiniband
hfi1_0  hfi1_1  i40iw0  i40iw1  mlx4_0  qib0

It is also easy to get the port list, just read the dir of "/sys/class/infiniband/<CA>/ports/ .

[root@rdma-master ~]$ ls /sys/class/infiniband/*/ports/
/sys/class/infiniband/hfi1_0/ports/:
1

/sys/class/infiniband/hfi1_1/ports/:
1

/sys/class/infiniband/i40iw0/ports/:
1

/sys/class/infiniband/i40iw1/ports/:
1

/sys/class/infiniband/mlx4_0/ports/:
1  2

/sys/class/infiniband/qib0/ports/:
1


Now, read the file /sys/class/infiniband/<CA>/ports/<NUM>/state to get the state of port.

[root@rdma-master ~]$ tail  /sys/class/infiniband/*/ports/*/state 
==> /sys/class/infiniband/hfi1_0/ports/1/state <==
4: ACTIVE

==> /sys/class/infiniband/hfi1_1/ports/1/state <==
1: DOWN

==> /sys/class/infiniband/i40iw0/ports/1/state <==
1: DOWN

==> /sys/class/infiniband/i40iw1/ports/1/state <==
4: ACTIVE

==> /sys/class/infiniband/mlx4_0/ports/1/state <==
4: ACTIVE

==> /sys/class/infiniband/mlx4_0/ports/2/state <==
4: ACTIVE

==> /sys/class/infiniband/qib0/ports/1/state <==
4: ACTIVE


Is it doable for Option B?

Comment 9 Pavel Moravec 2018-04-03 06:09:26 UTC
B) shouldnt be a problem, but that info can be collected more easily - we dont need to identify and cat the files, we can easily grab whole /sys/class/infiniband directory ;-) or even /sys/class/infiniband/*/ports directories.

Is collecting any file with /sys/class/infiniband/*/ports mask sufficient? You can try it by updating /usr/lib/python2.7/site-packages/sos/plugins/infiniband.py to:


        self.add_copy_spec([
            "/etc/ofed/openib.conf",
            "/etc/ofed/opensm.conf",
            "/etc/rdma",                       # dont forget adding comma here
            "/sys/class/infiniband/*/ports"    # add this line
        ])

Is this change the only required, please?

Comment 10 Honggang LI 2018-04-03 07:26:26 UTC
No, it will not work. Option B is not to collect /sys/class/infiniband directory. After we get the active port list from those directory, we will run those four commands over the active ports.

For example:

iblinkinfo -C mlx4_0 -P 1
iblinkinfo -C mlx4_0 -P 2

I will create a python patch for you to review. But I'm not good and python programming. so the patch will be very ugly.

Comment 12 Honggang LI 2018-04-03 09:22:47 UTC
Created attachment 1416687 [details]
Loop all active IB ports

Comment 13 Pavel Moravec 2018-04-03 13:22:00 UTC
Thanks, with some minor improvements, I created PR https://github.com/sosreport/sos/pull/1262 for the same.

devel_ack+ for 7.6

Comment 14 Pavel Moravec 2018-06-11 14:54:59 UTC
Hello,
same question here - could you please do OtherQE here as well? :)

Comment 15 Mike Stowell 2018-06-11 17:44:22 UTC
(In reply to Pavel Moravec from comment #14)
> Hello,
> same question here - could you please do OtherQE here as well? :)

infiniband-qe@ will take it

Comment 17 zguo 2018-07-31 10:28:05 UTC
[root@rdma-virt-01 infiniband]$ ls
ibhosts_-C_mlx4_0_-P_1  iblinkinfo_-C_mlx4_0_-P_1  ibstat    ibv_devices     perfquery_-C_mlx4_0_-P_1  sminfo_-C_mlx4_0_-P_1
ibhosts_-C_mlx4_0_-P_2  iblinkinfo_-C_mlx4_0_-P_2  ibstatus  ibv_devinfo_-v  perfquery_-C_mlx4_0_-P_2  sminfo_-C_mlx4_0_-P_2
[root@rdma-virt-01 infiniband]$ cat sminfo_-C_mlx4_0_-P_1 
sminfo: sm lid 2 sm guid 0xf4521403007be131, activity count 12743948 priority 15 state 3 SMINFO_MASTER
[root@rdma-virt-01 infiniband]$ cat sminfo_-C_mlx4_0_-P_2
sminfo: sm lid 34 sm guid 0x1175000078b68e, activity count 1484999 priority 15 state 3 SMINFO_MASTER
[root@rdma-virt-01 infiniband]$ cat ibstat
CA 'mlx4_0'
	CA type: MT4103
	Number of ports: 2
	Firmware version: 2.40.7000
	Hardware version: 0
	Node GUID: 0xe41d2d03001d6790
	System image GUID: 0xe41d2d03001d6793
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 102
		LMC: 1
		SM lid: 2
		Capability mask: 0x02514868
		Port GUID: 0xe41d2d03001d6791
		Link layer: InfiniBand
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 74
		LMC: 1
		SM lid: 34
		Capability mask: 0x02514868
		Port GUID: 0xe41d2d03001d6792
		Link layer: InfiniBand
CA 'mlx4_1'
	CA type: MT4103
	Number of ports: 1
	Firmware version: 2.40.7000
	Hardware version: 0
	Node GUID: 0x0002c90300ee5860
	System image GUID: 0x0002c90300ee5863
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x04010000
		Port GUID: 0x0202c9fffeee5860
		Link layer: Ethernet
[root@rdma-virt-01 infiniband]$ rpm -q sos
sos-3.6-3.el7.noarch

Comment 19 errata-xmlrpc 2018-10-30 10:31:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3144


Note You need to log in before you can comment on or make changes to this bug.