Red Hat Bugzilla – Bug 218310
scsi midlayer race condition : scan vs block/unblock deadlocks sdev
Last modified: 2007-11-30 17:07:38 EST
Description of problem:
We recently identified a bug in the upstream kernel, also present in SLES10.
Please include this patch in SLES10 SP1.
Our testing has encountered an error between sdev initialization/
scanning and the sdev block/unblock behavior. What we have seen is that
new target detection will kick off a scan, and that an sdev will be in
the creation process with the state SDEV_CREATED. At this point a link
event occurs, which blocks the sdev, changes its state to SDEV_BLOCK,
and stops its request queue. However, the creation thread is still
executing, and decides to transition the sdev state to SDEV_RUNNING.
Note that the request queue is still blocked. The sdev then gets unblocked,
attempting to change the state to SDEV_RUNNING, which fails as it is already
SDEV_RUNNING, which causes the unblock routine to bypass the call to
This patch modifies the creation path so that it only changes to SDEV_RUNNING
if the state is SDEV_CREATED. This allows the block/unblock to work
appropriately. It does have a side effect that unblock could early-transition
the sdev to SDEV_RUNNING.
Version-Release number of selected component (if applicable):
RHEL5 Beta & RC kernels
Cable Pull testing - pull immediately after first presentation to os.
Do you expect this to be accepted upstream soon?
> It does have a side effect that unblock could early-transition
> the sdev to SDEV_RUNNING.
What is the impact?
Yes. As to when, you know how this works - it's under James B's control.
I'm pinging him to see why it hasn't been in rc-fixes yet.
Impact should be nothing. I haven't validated via code review, but I would
guess that :
a) there should be no i/o for the device as we're still in scsi_scan, so a
transition to RUNNING shouldn't matter;
b) I only believe we're at risk if something is validating the state while not
successfully completing scsi_scan. First, it is *very* rare we would not
complete scsi_scan successfully. Second, it makes little sense, in scsi_scan, to
validate sdev state.
c) My only risk is that scsi_scan can be a long process. Could be there may be
multiple scans outstanding, thus reuse/discovery of the sdev could be at risk.
However, I feel I'm being very hypothetical to even think of this.
QE ack for RHEL5.
Posted Fri, 22 Dec 2006
Mike Christie pointed out some issues with this patch. It turns out that it is
not needed in RHEL 5. This request is withdrawn.