Bug 218310 - scsi midlayer race condition : scan vs block/unblock deadlocks sdev
Summary: scsi midlayer race condition : scan vs block/unblock deadlocks sdev
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
urgent
Target Milestone: ---
: ---
Assignee: Mike Christie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 217215
TreeView+ depends on / blocked
 
Reported: 2006-12-04 16:20 UTC by James Smart
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-01-04 03:35:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description James Smart 2006-12-04 16:20:45 UTC
Description of problem:

We recently identified a bug in the upstream kernel, also present in SLES10.
Please include this patch in SLES10 SP1.

http://marc.theaimsgroup.com/?l=linux-scsi&m=116474894126200&w=2

Our testing has encountered an error between sdev initialization/
scanning and the sdev block/unblock behavior. What we have seen is that 
new target detection will kick off a scan, and that an sdev will be in 
the creation process with the state SDEV_CREATED. At this point a link 
event occurs, which blocks the sdev, changes its state to SDEV_BLOCK, 
and stops its request queue.  However, the creation thread is still 
executing, and decides to transition the sdev state to SDEV_RUNNING. 
Note that the request queue is still blocked. The sdev then gets unblocked, 
attempting to change the state to SDEV_RUNNING, which fails as it is already 
SDEV_RUNNING, which causes the unblock routine to bypass the call to 
blk_start_queue().

This patch modifies the creation path so that it only changes to SDEV_RUNNING
if the state is SDEV_CREATED. This allows the block/unblock to work 
appropriately. It does have a side effect that unblock could early-transition 
the sdev to SDEV_RUNNING.


Version-Release number of selected component (if applicable):

RHEL5 Beta & RC kernels

How reproducible:

Cable Pull testing - pull immediately after first presentation to os.

Comment 4 Tom Coughlan 2006-12-12 18:56:41 UTC
Do you expect this to be accepted upstream soon? 

Despite this:

> It does have a side effect that unblock could early-transition 
> the sdev to SDEV_RUNNING.

?

What is the impact? 

Comment 5 James Smart 2006-12-12 20:27:20 UTC
Yes. As to when, you know how this works - it's under James B's control.
I'm pinging him to see why it hasn't been in rc-fixes yet.

Impact should be nothing. I haven't validated via code review, but I would
guess that :
a) there should be no i/o for the device as we're still in scsi_scan, so a
transition to RUNNING shouldn't matter;

b) I only believe we're at risk if something is validating the state while not
successfully completing scsi_scan. First, it is *very* rare we would not
complete scsi_scan successfully. Second, it makes little sense, in scsi_scan, to
validate sdev state.

c) My only risk is that scsi_scan can be a long process. Could be there may be
multiple scans outstanding, thus reuse/discovery of the sdev could be at risk.
However, I feel I'm being very hypothetical to even think of this.

Comment 6 Jay Turner 2006-12-14 02:38:35 UTC
QE ack for RHEL5.

Comment 7 Tom Coughlan 2007-01-02 14:39:40 UTC
Posted Fri, 22 Dec 2006

Comment 8 Tom Coughlan 2007-01-04 03:35:19 UTC
Mike Christie pointed out some issues with this patch. It turns out that it is
not needed in RHEL 5. This request is withdrawn. 



Note You need to log in before you can comment on or make changes to this bug.