Description of problem: We recently identified a bug in the upstream kernel, also present in SLES10. Please include this patch in SLES10 SP1. http://marc.theaimsgroup.com/?l=linux-scsi&m=116474894126200&w=2 Our testing has encountered an error between sdev initialization/ scanning and the sdev block/unblock behavior. What we have seen is that new target detection will kick off a scan, and that an sdev will be in the creation process with the state SDEV_CREATED. At this point a link event occurs, which blocks the sdev, changes its state to SDEV_BLOCK, and stops its request queue. However, the creation thread is still executing, and decides to transition the sdev state to SDEV_RUNNING. Note that the request queue is still blocked. The sdev then gets unblocked, attempting to change the state to SDEV_RUNNING, which fails as it is already SDEV_RUNNING, which causes the unblock routine to bypass the call to blk_start_queue(). This patch modifies the creation path so that it only changes to SDEV_RUNNING if the state is SDEV_CREATED. This allows the block/unblock to work appropriately. It does have a side effect that unblock could early-transition the sdev to SDEV_RUNNING. Version-Release number of selected component (if applicable): RHEL5 Beta & RC kernels How reproducible: Cable Pull testing - pull immediately after first presentation to os.
Do you expect this to be accepted upstream soon? Despite this: > It does have a side effect that unblock could early-transition > the sdev to SDEV_RUNNING. ? What is the impact?
Yes. As to when, you know how this works - it's under James B's control. I'm pinging him to see why it hasn't been in rc-fixes yet. Impact should be nothing. I haven't validated via code review, but I would guess that : a) there should be no i/o for the device as we're still in scsi_scan, so a transition to RUNNING shouldn't matter; b) I only believe we're at risk if something is validating the state while not successfully completing scsi_scan. First, it is *very* rare we would not complete scsi_scan successfully. Second, it makes little sense, in scsi_scan, to validate sdev state. c) My only risk is that scsi_scan can be a long process. Could be there may be multiple scans outstanding, thus reuse/discovery of the sdev could be at risk. However, I feel I'm being very hypothetical to even think of this.
QE ack for RHEL5.
Posted Fri, 22 Dec 2006
Mike Christie pointed out some issues with this patch. It turns out that it is not needed in RHEL 5. This request is withdrawn.