Bug 645547 - opensm 3.3.3 breaks SRP (and some ib connections)
Summary: opensm 3.3.3 breaks SRP (and some ib connections)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: opensm
Version: 5.5
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Jay Fenlason
QA Contact: Infiniband QE
URL:
Whiteboard:
Depends On:
Blocks: 650925
TreeView+ depends on / blocked
 
Reported: 2010-10-21 19:24 UTC by Ed Wahl
Modified: 2018-11-14 16:39 UTC (History)
5 users (show)

Fixed In Version: 3.3.3-2.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 06:35:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0969 0 normal SHIPPED_LIVE opensm bug fix update 2011-07-20 15:45:28 UTC

Description Ed Wahl 2010-10-21 19:24:27 UTC
Description of problem:
srp breaks and fails to connect to any targets under opensm3.3.3-* on rhel5.?
due to upstream bug. (since fixed upstream)

Version-Release number of selected component (if applicable):
opensm-3.3.3-1.el5

How reproducible:
configure an srp target on ib, initiate with earlier or later versions of opensm. switch to 3.3.3-*.  scsi requests now fails to return.

Steps to Reproduce:
1.use opensm 3.3.3
2.srp_daemon scan ib for available targets
3. timeouts
  
Actual results:
2010-10-18T15:42:36.001831-04:00 xio67 kernel: scsi1 : SRP.T10:50001FF500050208
2010-10-18T15:42:41.502644-04:00 xio67 kernel: host1: SRP abort called
2010-10-18T15:42:46.501661-04:00 xio67 kernel: host1: SRP reset_device called
2010-10-18T15:42:51.500682-04:00 xio67 kernel: host1: ib_srp: SRP reset_host call
ed state 0 qp_err 0
2010-10-18T15:43:11.501750-04:00 xio67 kernel: host1: SRP abort called
2010-10-18T15:43:16.500763-04:00 xio67 kernel: host1: SRP reset_device called
2010-10-18T15:43:21.500780-04:00 xio67 kernel: host1: ib_srp: SRP reset_host call
ed state 0 qp_err 0
2010-10-18T15:43:31.501804-04:00 xio67 kernel: scsi 1:0:0:0: scsi: Device offline
d - not ready after error recovery
2010-10-18T15:43:31.501813-04:00 xio67 kernel: scsi 1:0:0:0: timing out command, 
waited 22s


Expected results:
2010-10-19T16:36:11.833770-04:00 xio67 kernel: scsi1 : SRP.T10:50001FF500050208
2010-10-19T16:36:11.835364-04:00 xio67 kernel:  Vendor: IBM       Model: DCS9900 
          Rev: 6.05
2010-10-19T16:36:11.835373-04:00 xio67 kernel:  Type:   Direct-Access            
          ANSI SCSI revision: 05
2010-10-19T16:36:11.835722-04:00 xio67 kernel: sdb : very big device. try to use 
READ CAPACITY(16).
2010-10-19T16:36:11.835844-04:00 xio67 kernel: SCSI device sdb: 15627665408 512-b
yte hdwr sectors (8001365 MB)
2010-10-19T16:36:11.835996-04:00 xio67 kernel: sdb: Write Protect is off
2010-10-19T16:36:11.836202-04:00 xio67 kernel: SCSI device sdb: drive cache: writ
e back w/ FUA
2010-10-19T16:36:11.836429-04:00 xio67 kernel: sdb : very big device. try to use 
READ CAPACITY(16).
2010-10-19T16:36:11.836534-04:00 xio67 kernel: SCSI device sdb: 15627665408 512-b
yte hdwr sectors (8001365 MB)
2010-10-19T16:36:11.836639-04:00 xio67 kernel: sdb: Write Protect is off
2010-10-19T16:36:11.836851-04:00 xio67 kernel: SCSI device sdb: drive cache: writ
e back w/ FUA
2010-10-19T16:36:11.854145-04:00 xio67 kernel: sdb: unknown partition table


Additional info:

This was broken in upstream patch 3d20f82edd3246879063b77721d0bcef927bdc48 for opensm 3.3.3. Has since been patched in post 12/16/09 versions.  This forces us to run later hand-built versions of opensm.  Only appears to break srp calls and some minor other ib traffic.

Fix is to include patch 5201f84* from opensm 3.3.5 and later. (ok patch is really 520af849615e7ee603b96498da9f3bc554470c06 but, you know. ^->)

Comment 8 errata-xmlrpc 2011-07-21 06:35:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0969.html


Note You need to log in before you can comment on or make changes to this bug.