Bug 645547 - opensm 3.3.3 breaks SRP (and some ib connections)
opensm 3.3.3 breaks SRP (and some ib connections)
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: opensm (Show other bugs)
5.5
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Jay Fenlason
Infiniband QE
: ZStream
Depends On:
Blocks: 650925
  Show dependency treegraph
 
Reported: 2010-10-21 15:24 EDT by Ed Wahl
Modified: 2014-08-31 19:30 EDT (History)
5 users (show)

See Also:
Fixed In Version: 3.3.3-2.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-21 02:35:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Ed Wahl 2010-10-21 15:24:27 EDT
Description of problem:
srp breaks and fails to connect to any targets under opensm3.3.3-* on rhel5.?
due to upstream bug. (since fixed upstream)

Version-Release number of selected component (if applicable):
opensm-3.3.3-1.el5

How reproducible:
configure an srp target on ib, initiate with earlier or later versions of opensm. switch to 3.3.3-*.  scsi requests now fails to return.

Steps to Reproduce:
1.use opensm 3.3.3
2.srp_daemon scan ib for available targets
3. timeouts
  
Actual results:
2010-10-18T15:42:36.001831-04:00 xio67 kernel: scsi1 : SRP.T10:50001FF500050208
2010-10-18T15:42:41.502644-04:00 xio67 kernel: host1: SRP abort called
2010-10-18T15:42:46.501661-04:00 xio67 kernel: host1: SRP reset_device called
2010-10-18T15:42:51.500682-04:00 xio67 kernel: host1: ib_srp: SRP reset_host call
ed state 0 qp_err 0
2010-10-18T15:43:11.501750-04:00 xio67 kernel: host1: SRP abort called
2010-10-18T15:43:16.500763-04:00 xio67 kernel: host1: SRP reset_device called
2010-10-18T15:43:21.500780-04:00 xio67 kernel: host1: ib_srp: SRP reset_host call
ed state 0 qp_err 0
2010-10-18T15:43:31.501804-04:00 xio67 kernel: scsi 1:0:0:0: scsi: Device offline
d - not ready after error recovery
2010-10-18T15:43:31.501813-04:00 xio67 kernel: scsi 1:0:0:0: timing out command, 
waited 22s


Expected results:
2010-10-19T16:36:11.833770-04:00 xio67 kernel: scsi1 : SRP.T10:50001FF500050208
2010-10-19T16:36:11.835364-04:00 xio67 kernel:  Vendor: IBM       Model: DCS9900 
          Rev: 6.05
2010-10-19T16:36:11.835373-04:00 xio67 kernel:  Type:   Direct-Access            
          ANSI SCSI revision: 05
2010-10-19T16:36:11.835722-04:00 xio67 kernel: sdb : very big device. try to use 
READ CAPACITY(16).
2010-10-19T16:36:11.835844-04:00 xio67 kernel: SCSI device sdb: 15627665408 512-b
yte hdwr sectors (8001365 MB)
2010-10-19T16:36:11.835996-04:00 xio67 kernel: sdb: Write Protect is off
2010-10-19T16:36:11.836202-04:00 xio67 kernel: SCSI device sdb: drive cache: writ
e back w/ FUA
2010-10-19T16:36:11.836429-04:00 xio67 kernel: sdb : very big device. try to use 
READ CAPACITY(16).
2010-10-19T16:36:11.836534-04:00 xio67 kernel: SCSI device sdb: 15627665408 512-b
yte hdwr sectors (8001365 MB)
2010-10-19T16:36:11.836639-04:00 xio67 kernel: sdb: Write Protect is off
2010-10-19T16:36:11.836851-04:00 xio67 kernel: SCSI device sdb: drive cache: writ
e back w/ FUA
2010-10-19T16:36:11.854145-04:00 xio67 kernel: sdb: unknown partition table


Additional info:

This was broken in upstream patch 3d20f82edd3246879063b77721d0bcef927bdc48 for opensm 3.3.3. Has since been patched in post 12/16/09 versions.  This forces us to run later hand-built versions of opensm.  Only appears to break srp calls and some minor other ib traffic.

Fix is to include patch 5201f84* from opensm 3.3.5 and later. (ok patch is really 520af849615e7ee603b96498da9f3bc554470c06 but, you know. ^->)
Comment 8 errata-xmlrpc 2011-07-21 02:35:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0969.html

Note You need to log in before you can comment on or make changes to this bug.