Bug 640115

Summary: [Broadcom 6.1 bug] Race condition in the INVALID_HOST path
Product: Red Hat Enterprise Linux 6 Reporter: Mike Christie <mchristi>
Component: iscsi-initiator-utilsAssignee: Andy Grover <agrover>
Status: CLOSED ERRATA QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: aaswath, albertt, anilgv, bdonahue, benlu, coughlan, cward, eddie.wai, edwardn, enarvaez, gideonn, mchristi, qcai, rwilliam, syeghiay
Target Milestone: rcKeywords: OtherQA
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The ISCSI_ERR_INVALID_HOST error event was not being handled correctly, leaving iSCSI sessions in memory when the iSCSI driver was attempting to shut down. This resulted in the driver failing to respond during shutdown of sessions that used the Broadcom NetXtreme II Network Adapter driver.
Story Points: ---
Clone Of: 640111 Environment:
Last Closed: 2011-05-19 14:14:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 640111    
Bug Blocks:    

Description Mike Christie 2010-10-04 20:59:53 UTC
+++ This bug was initially created as a clone of Bug #640111 +++

Description of problem:
A race condition can be observed in the ISCSI_ERR_INVALID_HOST handling in open-iscsi when it is being issued immediately after the reception of the ISCSI_ERR_CONN_FAILED nl message to execute the re-open path.  The race condition was found to be related to how the single threaded scheduler handles this INVALID_HOST message asynchronously from within the re-open path.  

The end result is that the actor which acts on the INVALID_HOST handling will get flushed by the re-open path handling.

Version-Release number of selected component (if applicable):
open-iscsi-2.0.871.1 - From inbox

How reproducible:
A few iterations depending on the timing execution of the procedure

Steps to Reproduce:
1. For every active sessions, issue an ISCSI_ERR_CONN_FAILED nl msg
   (This will eventually put all active connections into the actor_list ready
   to execute the session_conn_reopen procedure)
2. Asynchronously after a few seconds, call the iscsi_host_remove procedure
   (This will notify iscsid with the ISCSI_ERR_INVALID_HOST nl message)
3.
  
Actual results:
The number of outstanding session will not converge to 0 after the INVALID_HOST handling.

Expected results:
The number of outstanding session should converge to 0 after the INVALID_HOST handling.

Additional info:
The iscsi_host_remove() in libiscsi will wait indefinitely if this race problem occurs.

--- Additional comment from eddie.wai on 2010-10-04 16:39:35 EDT ---

Created attachment 451527 [details]
ISCSID: Fixed a race condition in the INVALID_HOST path

This should fix the race condition presented.

Comment 1 Mike Christie 2011-02-01 08:45:40 UTC
This is fixed in iscsi-initiator-utils-6.2.0.872-14.el6. You can download it here
http://people.redhat.com/mchristi/iscsi/rhel6.1/iscsi-initiator-utils/

Comment 4 Chris Ward 2011-04-06 11:02:36 UTC
~~ Partners and Customers ~~

This bug was included in RHEL 6.1 Beta. Please confirm the status of this request as soon as possible.

If you're having problems accessing 6.1 bits, are delayed in your test execution or find in testing that the request was not addressed adequately, please let us know.

Thanks!

Comment 5 edwardn 2011-05-11 17:13:28 UTC
This issue has been verified with the RH6.1 RC build (kernel
2.6.32-131.0.13) and iscsi-initiator-utils-6.2.0.872-21.

Comment 7 Laura Bailey 2011-05-12 07:45:40 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The ISCSI_ERR_INVALID_HOST error event was not being handled correctly, leaving iSCSI sessions in memory when the iSCSI driver was attempting to shut down. This resulted in the driver failing to respond during shutdown of sessions that used the Broadcom NetXtreme II Network Adapter driver.

Comment 9 errata-xmlrpc 2011-05-19 14:14:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0733.html