Bug 1727272
| Summary: | [RHEL-8.1/hfi1] ib_srpt Rejected login for initiator | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | zguo <zguo> | |
| Component: | rdma-core | Assignee: | Honggang LI <honli> | |
| Status: | CLOSED NOTABUG | QA Contact: | Infiniband QE <infiniband-qe> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 8.1 | CC: | honli, hwkernel-mgr, rdma-dev-team | |
| Target Milestone: | rc | |||
| Target Release: | 8.2 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1728091 (view as bug list) | Environment: | ||
| Last Closed: | 2019-07-09 12:45:03 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1728091 | |||
|
Description
zguo
2019-07-05 10:14:32 UTC
The ib_srpt rejects the login request because of bad srp client configurations. 1) The beaker case, kernel-kernel-infiniband-srp-srp_multi-0.1-7.noarch, needs update. The srp target configuration script goes wrong when the bnxt roce device enabled. ===== RHEL-8.1 ======= [root@rdma-dev-16 srp_multi]$ ibstat CA 'bnxt_re0' CA type: Broadcom NetXtreme-C/E RoCE Driver HCA Number of ports: 1 Firmware version: 215.0.170.0 Hardware version: 0x14e4 Node GUID: 0xb22628fffe4bcd80 System image GUID: 0xb22628fffe4bcd80 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x001d0000 Port GUID: 0xb22628fffe4bcd80 Link layer: Ethernet CA 'bnxt_re1' CA type: Broadcom NetXtreme-C/E RoCE Driver HCA Number of ports: 1 Firmware version: 215.0.170.0 Hardware version: 0x14e4 Node GUID: 0xb22628fffe4bcd81 System image GUID: 0xb22628fffe4bcd81 Port 1: State: Down Physical state: Disabled Rate: 100 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x001d0000 Port GUID: 0xb22628fffe4bcd81 Link layer: Ethernet CA 'hfi1_0' CA type: Number of ports: 1 Firmware version: 1.27.0 Hardware version: 10 Node GUID: 0x0011750101671ecf System image GUID: 0x0011750101671ecf Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 6 LMC: 0 SM lid: 8 Capability mask: 0x00490022 Port GUID: 0x0011750101671ecf Link layer: InfiniBand(In reply to zguo from comment #0) RHEL-8.1, targetcli configuration. [root@rdma-dev-16 srp_multi]$ targetcli ls / | grep -C 1 ib o- srpt ............................................................. [Targets: 3] o- ib.fe800000000000000011750101671ecf ........................... [no-gen-acls] | o- acls ............................................................ [ACLs: 1] | | o- ib.b22628fffe4bcd800011750101670ffa ................... [Mapped LUNs: 10] | | o- mapped_lun0 ............................ [lun0 fileio/backstore-1 (rw)] -- | o- lun9 ........... [fileio/backstore-10 (/srp/block-10) (default_tg_pt_gp)] o- ib.fe80000000000000b22628fffe4bcd80 ........................... [no-gen-acls] | o- acls ............................................................ [ACLs: 1] | | o- ib.b22628fffe4bcd800011750101670ffa ................... [Mapped LUNs: 10] | | o- mapped_lun0 ............................ [lun0 fileio/backstore-1 (rw)] -- | o- lun9 ........... [fileio/backstore-10 (/srp/block-10) (default_tg_pt_gp)] o- ib.fe80000000000000b22628fffe4bcd81 ........................... [no-gen-acls] o- acls ............................................................ [ACLs: 1] | o- ib.b22628fffe4bcd800011750101670ffa ................... [Mapped LUNs: 10] | o- mapped_lun0 ............................ [lun0 fileio/backstore-1 (rw)] Look at the client ACL IDs, they are wrong. For example, ib.b22628fffe4bcd800011750101670ffa the heading part, "b226", related to the bnxt device. You can confirm this by compare it with the port guid of the bnxt devices. ==== RHEL-8.0 The rhel-8.0 does not enable the bnxt roce device of rdma-dev-15/16 > The same test pass on RHEL-8.0.0 on hfi1 - > https://beaker.engineering.redhat.com/jobs/3650265 (server side log) http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/07/36502/3650265/7089527/95869449/439391730/resultoutputfile.log o- ib.fe800000000000000011750101670ffa .............................. [no-gen-acls] o- acls ........................................................... [ACLs: 1] | o- ib.0011750101670ffa0011750101671ecf ........................ [Mapped LUNs: 10] | o- mapped_lun0 ............................... [lun0 fileio/backstore-1 (rw)] The ACL ID is good. 2) Beside the bad ib_srpt/targetcli configuration, the srp client side writes wrong options to the /sys/class/infiniband_srp/srp-XXX/add_target file. If the "initiator_ext" was wrote to the "add_target" file, the login request will fail. There is a workaround for this. But it is NOT final solution. $ diff -Nurp /usr/lib/systemd/system/srp_daemon_port@.service.old /usr/lib/systemd/system/srp_daemon_port@.service --- /usr/lib/systemd/system/srp_daemon_port@.service.old 2019-07-08 11:13:02.999657559 -0400 +++ /usr/lib/systemd/system/srp_daemon_port@.service 2019-07-08 11:12:48.727649800 -0400 @@ -23,7 +23,7 @@ BindsTo=srp_daemon.service [Service] Type=simple -ExecStart=/usr/sbin/srp_daemon --systemd -e -c -n -j %I -R 60 +ExecStart=/usr/sbin/srp_daemon --systemd -e -c -j %I -R 60 MemoryDenyWriteExecute=yes PrivateNetwork=yes PrivateTmp=yes (In reply to Honggang LI from comment #2) > 2) Beside the bad ib_srpt/targetcli configuration, the srp client side > writes wrong options to the > /sys/class/infiniband_srp/srp-XXX/add_target file. If the "initiator_ext" > was wrote to the "add_target" > file, the login request will fail. There is a workaround for this. But it is > NOT final solution. When the opa subnet manager was running on rdma-qe-14/15, it need the workaround for srp login. When the SM running on rdma-dev-15/16 and bnxt_re/en modules removed, the workaround is unnecessary. SRP works again. I'm closing this bug as NOTABUG. |