This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours

Bug 645547

Summary: opensm 3.3.3 breaks SRP (and some ib connections)
Product: Red Hat Enterprise Linux 5 Reporter: Ed Wahl <ewahl>
Component: opensmAssignee: Jay Fenlason <fenlason>
Status: CLOSED ERRATA QA Contact: Infiniband QE <infiniband-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.5CC: gozen, jfeeney, jhanson, kzhang, martinez
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 3.3.3-2.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-21 02:35:36 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 650925    

Description Ed Wahl 2010-10-21 15:24:27 EDT
Description of problem:
srp breaks and fails to connect to any targets under opensm3.3.3-* on rhel5.?
due to upstream bug. (since fixed upstream)

Version-Release number of selected component (if applicable):
opensm-3.3.3-1.el5

How reproducible:
configure an srp target on ib, initiate with earlier or later versions of opensm. switch to 3.3.3-*.  scsi requests now fails to return.

Steps to Reproduce:
1.use opensm 3.3.3
2.srp_daemon scan ib for available targets
3. timeouts
  
Actual results:
2010-10-18T15:42:36.001831-04:00 xio67 kernel: scsi1 : SRP.T10:50001FF500050208
2010-10-18T15:42:41.502644-04:00 xio67 kernel: host1: SRP abort called
2010-10-18T15:42:46.501661-04:00 xio67 kernel: host1: SRP reset_device called
2010-10-18T15:42:51.500682-04:00 xio67 kernel: host1: ib_srp: SRP reset_host call
ed state 0 qp_err 0
2010-10-18T15:43:11.501750-04:00 xio67 kernel: host1: SRP abort called
2010-10-18T15:43:16.500763-04:00 xio67 kernel: host1: SRP reset_device called
2010-10-18T15:43:21.500780-04:00 xio67 kernel: host1: ib_srp: SRP reset_host call
ed state 0 qp_err 0
2010-10-18T15:43:31.501804-04:00 xio67 kernel: scsi 1:0:0:0: scsi: Device offline
d - not ready after error recovery
2010-10-18T15:43:31.501813-04:00 xio67 kernel: scsi 1:0:0:0: timing out command, 
waited 22s


Expected results:
2010-10-19T16:36:11.833770-04:00 xio67 kernel: scsi1 : SRP.T10:50001FF500050208
2010-10-19T16:36:11.835364-04:00 xio67 kernel:  Vendor: IBM       Model: DCS9900 
          Rev: 6.05
2010-10-19T16:36:11.835373-04:00 xio67 kernel:  Type:   Direct-Access            
          ANSI SCSI revision: 05
2010-10-19T16:36:11.835722-04:00 xio67 kernel: sdb : very big device. try to use 
READ CAPACITY(16).
2010-10-19T16:36:11.835844-04:00 xio67 kernel: SCSI device sdb: 15627665408 512-b
yte hdwr sectors (8001365 MB)
2010-10-19T16:36:11.835996-04:00 xio67 kernel: sdb: Write Protect is off
2010-10-19T16:36:11.836202-04:00 xio67 kernel: SCSI device sdb: drive cache: writ
e back w/ FUA
2010-10-19T16:36:11.836429-04:00 xio67 kernel: sdb : very big device. try to use 
READ CAPACITY(16).
2010-10-19T16:36:11.836534-04:00 xio67 kernel: SCSI device sdb: 15627665408 512-b
yte hdwr sectors (8001365 MB)
2010-10-19T16:36:11.836639-04:00 xio67 kernel: sdb: Write Protect is off
2010-10-19T16:36:11.836851-04:00 xio67 kernel: SCSI device sdb: drive cache: writ
e back w/ FUA
2010-10-19T16:36:11.854145-04:00 xio67 kernel: sdb: unknown partition table


Additional info:

This was broken in upstream patch 3d20f82edd3246879063b77721d0bcef927bdc48 for opensm 3.3.3. Has since been patched in post 12/16/09 versions.  This forces us to run later hand-built versions of opensm.  Only appears to break srp calls and some minor other ib traffic.

Fix is to include patch 5201f84* from opensm 3.3.5 and later. (ok patch is really 520af849615e7ee603b96498da9f3bc554470c06 but, you know. ^->)
Comment 8 errata-xmlrpc 2011-07-21 02:35:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0969.html