Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1727272

Summary: [RHEL-8.1/hfi1] ib_srpt Rejected login for initiator
Product: Red Hat Enterprise Linux 8 Reporter: zguo <zguo>
Component: rdma-coreAssignee: Honggang LI <honli>
Status: CLOSED NOTABUG QA Contact: Infiniband QE <infiniband-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.1CC: honli, hwkernel-mgr, rdma-dev-team
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1728091 (view as bug list) Environment:
Last Closed: 2019-07-09 12:45:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1728091    

Description zguo 2019-07-05 10:14:32 UTC
Description of problem:


Version-Release number of selected component (if applicable):

$ rpm -q kernel rdma-core
kernel-4.18.0-107.el8.x86_64
rdma-core-22-2.el8.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set up srp target
2. Initiator try to discover and login target
3. 

Actual results:

- On target, dmesg

ib_srpt Rejected login for initiator 0011:7501:0167:1ecf: ret = -13.
ib_srpt Rejecting login with reason 0x10006

- On initiator, dmesg
[ 1375.130289] scsi host11: ib_srp: Path record query failed: sgid fe80:0000:0000:0000:0011:7501:0167:1ecf, dgid fe80:0000:0000:0000:0011:7501:010
[ 1375.151244] scsi host11: ib_srp: Connection 0/2 to fe80:0000:0000:0000:0011:7501:0167:10f0 failed
sg | gg[ 1379.548716] scsi host11: ib_srp: SRP_LOGIN_REJ: requested max_it_iu_len too large
[ 1379.557951] scsi host11: ib_srp: Connection 0/2 to fe80:0000:0000:0000:0011:7501:0109:6c5d failed

Expected results:

Initiator can connect to target successfully.

Additional info:

Only hit this issue on hfi1:

fail hfi1 https://beaker.engineering.redhat.com/jobs/3650489

pass mlx5 https://beaker.engineering.redhat.com/jobs/3643876

The same test pass on RHEL-8.0.0 on hfi1 - https://beaker.engineering.redhat.com/jobs/3650265

Comment 2 Honggang LI 2019-07-09 04:19:10 UTC
The  ib_srpt rejects the login request because of bad  srp client configurations.

1) The beaker case, kernel-kernel-infiniband-srp-srp_multi-0.1-7.noarch, needs update.
The srp target configuration script goes wrong when the bnxt roce device enabled.

===== RHEL-8.1 =======

[root@rdma-dev-16 srp_multi]$  ibstat
CA 'bnxt_re0'
	CA type: Broadcom NetXtreme-C/E RoCE Driver HCA
	Number of ports: 1
	Firmware version: 215.0.170.0
	Hardware version: 0x14e4
	Node GUID: 0xb22628fffe4bcd80
	System image GUID: 0xb22628fffe4bcd80
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x001d0000
		Port GUID: 0xb22628fffe4bcd80
		Link layer: Ethernet
CA 'bnxt_re1'
	CA type: Broadcom NetXtreme-C/E RoCE Driver HCA
	Number of ports: 1
	Firmware version: 215.0.170.0
	Hardware version: 0x14e4
	Node GUID: 0xb22628fffe4bcd81
	System image GUID: 0xb22628fffe4bcd81
	Port 1:
		State: Down
		Physical state: Disabled
		Rate: 100
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x001d0000
		Port GUID: 0xb22628fffe4bcd81
		Link layer: Ethernet
CA 'hfi1_0'
	CA type: 
	Number of ports: 1
	Firmware version: 1.27.0
	Hardware version: 10
	Node GUID: 0x0011750101671ecf
	System image GUID: 0x0011750101671ecf
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 6
		LMC: 0
		SM lid: 8
		Capability mask: 0x00490022
		Port GUID: 0x0011750101671ecf
		Link layer: InfiniBand(In reply to zguo from comment #0)


RHEL-8.1, targetcli configuration.

[root@rdma-dev-16 srp_multi]$ targetcli ls / | grep -C 1 ib
  o- srpt ............................................................. [Targets: 3]
    o- ib.fe800000000000000011750101671ecf ........................... [no-gen-acls]
    | o- acls ............................................................ [ACLs: 1]
    | | o- ib.b22628fffe4bcd800011750101670ffa ................... [Mapped LUNs: 10]
    | |   o- mapped_lun0 ............................ [lun0 fileio/backstore-1 (rw)]
--
    |   o- lun9 ........... [fileio/backstore-10 (/srp/block-10) (default_tg_pt_gp)]
    o- ib.fe80000000000000b22628fffe4bcd80 ........................... [no-gen-acls]
    | o- acls ............................................................ [ACLs: 1]
    | | o- ib.b22628fffe4bcd800011750101670ffa ................... [Mapped LUNs: 10]
    | |   o- mapped_lun0 ............................ [lun0 fileio/backstore-1 (rw)]
--
    |   o- lun9 ........... [fileio/backstore-10 (/srp/block-10) (default_tg_pt_gp)]
    o- ib.fe80000000000000b22628fffe4bcd81 ........................... [no-gen-acls]
      o- acls ............................................................ [ACLs: 1]
      | o- ib.b22628fffe4bcd800011750101670ffa ................... [Mapped LUNs: 10]
      |   o- mapped_lun0 ............................ [lun0 fileio/backstore-1 (rw)]


Look at the client ACL IDs, they are wrong. For example, ib.b22628fffe4bcd800011750101670ffa
the heading part, "b226", related to the bnxt device. You can confirm this by compare it with
the port guid of the bnxt devices.

==== RHEL-8.0

The rhel-8.0 does not enable the bnxt roce device of rdma-dev-15/16

> The same test pass on RHEL-8.0.0 on hfi1 -
> https://beaker.engineering.redhat.com/jobs/3650265

(server side log)
http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/07/36502/3650265/7089527/95869449/439391730/resultoutputfile.log

    o- ib.fe800000000000000011750101670ffa .............................. [no-gen-acls]
      o- acls ........................................................... [ACLs: 1]
      | o- ib.0011750101670ffa0011750101671ecf ........................ [Mapped LUNs: 10]
      |   o- mapped_lun0 ............................... [lun0 fileio/backstore-1 (rw)]


The ACL ID is good.

2) Beside the bad ib_srpt/targetcli configuration, the srp client side writes wrong options to the
/sys/class/infiniband_srp/srp-XXX/add_target file. If the "initiator_ext" was wrote to the "add_target"
file, the login request will fail. There is a workaround for this. But it is NOT final solution.

$ diff -Nurp /usr/lib/systemd/system/srp_daemon_port@.service.old /usr/lib/systemd/system/srp_daemon_port@.service
--- /usr/lib/systemd/system/srp_daemon_port@.service.old	2019-07-08 11:13:02.999657559 -0400
+++ /usr/lib/systemd/system/srp_daemon_port@.service	2019-07-08 11:12:48.727649800 -0400
@@ -23,7 +23,7 @@ BindsTo=srp_daemon.service
 
 [Service]
 Type=simple
-ExecStart=/usr/sbin/srp_daemon --systemd -e -c -n -j %I -R 60
+ExecStart=/usr/sbin/srp_daemon --systemd -e -c  -j %I -R 60
 MemoryDenyWriteExecute=yes
 PrivateNetwork=yes
 PrivateTmp=yes

Comment 3 Honggang LI 2019-07-09 12:45:03 UTC
(In reply to Honggang LI from comment #2)

> 2) Beside the bad ib_srpt/targetcli configuration, the srp client side
> writes wrong options to the
> /sys/class/infiniband_srp/srp-XXX/add_target file. If the "initiator_ext"
> was wrote to the "add_target"
> file, the login request will fail. There is a workaround for this. But it is
> NOT final solution.

When the opa subnet manager was running on rdma-qe-14/15, it need the workaround for
srp login. When the SM running on rdma-dev-15/16 and bnxt_re/en modules removed,
the workaround is unnecessary. SRP works again.

I'm closing this bug as NOTABUG.