Bug 1535978

Summary: [RHEL-7.5/RDMA] opensm only honors the first item of mgroup_flags
Product: Red Hat Enterprise Linux 7 Reporter: Honggang LI <honli>
Component: opensmAssignee: Honggang LI <honli>
Status: CLOSED ERRATA QA Contact: Mike Stowell <mstowell>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: bhu, ddutile, honli, infiniband-qe, mstowell, rdma-dev-team
Target Milestone: rc   
Target Release: 7.7   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: opensm-3.3.21-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 12:46:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1614004    

Description Honggang LI 2018-01-18 11:35:53 UTC
Description of problem:

https://bugzilla.redhat.com/show_bug.cgi?id=1534869#c18

Version-Release number of selected component (if applicable):
1) RHEL7.4 in-box opensm-3.3.19-1.el7.x86_64
2) MLX-OFED opensm-4.9.1.MLNX20171001.1764298-0.1.42120.x86_64
3) latest upstream opensm
[root@rdma03 opensm]# git remote -v
origin	git://git.openfabrics.org/~halr/opensm.git (fetch)
origin	git://git.openfabrics.org/~halr/opensm.git (push)
[root@rdma03 opensm]# git log | head -n 1
commit d4479baff49f3d0e21cda27605f42b5999c37fd4

How reproducible:
always

Steps to Reproduce:
1. https://bugzilla.redhat.com/show_bug.cgi?id=1534869#c18
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Don Dutile (Red Hat) 2018-06-21 23:00:35 UTC
(In reply to Honggang LI from comment #0)
c#18 is very long, and requires the reader to parse, and all readers to parse to same conclusion.

A clean summary/description of the problem should be stated here.

also, you mention 7.4 opensm version, then mellanox, then upstream commit-id.
What point are you trying to make?
What is the reader suppose to conclude from the different versions -- b/c these versions almost always differ at any point in time.

Comment 3 Honggang LI 2018-06-22 01:21:49 UTC
(In reply to Don Dutile from comment #2)
> (In reply to Honggang LI from comment #0)
> c#18 is very long, and requires the reader to parse, and all readers to
> parse to same conclusion.
> 
> A clean summary/description of the problem should be stated here.


Summary:

To setup the MTU for MC group, the 'mtu=xxx' must to be the first field of mgroup_flag. Otherwise, opensm will ignore it and set the MTU to default value 2048.

Good configuration. MTU will be set to 4096. 'mut=5' is the first field of mgroup_flag.
[root@rdma03 opensm]# cat partitions.conf.default 
Default=0x7fff,ipoib, mtu=5 rate=12:ALL=full;
                      ^^^^^

Bad configuration. MTU will be set to 2048. 'mut=5' is the second field of mgroup flag.
[root@rdma03 opensm]# cat partitions.conf.default 
Default=0x7fff,ipoib, rate=12 mtu=5:ALL=full;
                      ^^^^^^^

> also, you mention 7.4 opensm version, then mellanox, then upstream commit-id.
> What point are you trying to make?

Issue can be reproduced with all of those versions of opensm.

Comment 4 Don Dutile (Red Hat) 2018-06-22 03:39:29 UTC
(In reply to Honggang LI from comment #3)
Thanks for all that clarification!

Now I understand your remark about the first field... I didn't get that before.

So, is upstream fixed yet, or do you have to post a patch for it?

Comment 9 Honggang LI 2018-08-29 12:30:56 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1534869#c21

Copy and paste the explanation in here.


> [root@rdma-master ~]$ cat /etc/rdma/partitions-ib0.conf  | grep mtu
> # mtu = 
> Default=0x7fff, rate=6 mtu=5 scope=2, defmember=full:
> Default=0x7fff, ipoib, rate=6 mtu=5 scope=2:
                         ^^^^^^^^^^^^^^^^^^^^

Because of two issues, we failed to set the MTU to 4K.
1) The configuration file is wrong. There MUST BE a comma (,) between the mgroup_flag flags.

Default=0x7fff, ipoib, rate=6 mtu=5 scope=2:

should be:

Default=0x7fff, ipoib, rate=6, mtu=5, scope=2:
                             ^      ^


I believe we had been mislead by the example configuration file "/etc/rdma/partitions.conf" and upstream doc source file "opensm-top-dir/doc/partition-config.txt". No doc emphasize that the field of mgroup_flag must be spilt with a "comma".

We should update these two files.

2) The function "parse_name_token" is error prone. It gives us a wrong 'flval' when wrong configuration passed into it. In fact, it should raise an error.


I instrumented upstream opensm source code. Output with wrong configuration file.
----------------------------
osm_prtn_config_parse_file open /etc/opensm/partitions.conf
osm_prtn_config_parse_file read line (1) (# Bad configuration, ib0's mtu will be 2044
)
osm_prtn_config_parse_file read line (2) (# Default=0x7fff,ipoib, rate=12 mtu=5:ALL=full;
)
osm_prtn_config_parse_file read line (3) (
)
osm_prtn_config_parse_file read line (4) (# Good configuration, ib0's mtu will be 4092
)
osm_prtn_config_parse_file read line (5) (Default=0x7fff,ipoib, mtu=5 rate=12:ALL=full;
)

===>  parse_name_token return ret=(15) name=(Default), id=(0x7fff)

===>  parse_name_token return ret=(6) flag=(ipoib), flval=((null))

===>  parse_name_token return ret=(15) flag=(mtu), flval=(5 rate=12) <=====
                                                   ^^^^^^^^^^^^^^^^^

IT SHOULD RAISE AN ERROR IN HERE, AS WRONG 'FLVAL' RETURNED. THAT IS WHY OPENSM ONLY HONOR THE FIRST FIELD OF MG_GROUP_FLAG.

===>  parse_name_token return ret=(9) name=(ALL), flag=(full)
----------------------------


> ib0_2=0x0002, rate=7 mtu=5 scope=2, defmember=full:
> ib0_2=0x0002, ipoib, rate=7 mtu=5 scope=2:
> ib0_4=0x0004, rate=3 mtu=5 scope=2, defmember=full:
> ib0_4=0x0004, ipoib, rate=3 mtu=5 scope=2:
> ib0_6=0x0006, rate=12 mtu=5 scope=2, defmember=full:
> ib0_6=0x0006, ipoib, rate=12 mtu=5 scope=2:
> 
> 
> mtu=5 means MTU==4K.

Comment 10 Honggang LI 2018-09-05 11:39:49 UTC
024fe73e4481 opensm.8.in:  Emphasize that the fields of mgroup_flag must be split with "comma"
1f82c22a1237 partition-config.txt: Emphasize that the fields of mgroup_flag must be split with "comma"
04d2a8be0305 osm_prtn_config.c: parse_group_flag log suspicious group flag value

Upstream patches fix this issue.

Comment 17 errata-xmlrpc 2019-08-06 12:46:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2100