There appears to be a race condition in the sg_read() routine of the generic scsi driver (sg.c) supplied with Red Hat Linux 6.0. When the check is made whether or not to put the process to sleep awaiting input, interrupts are not disabled, resulting in the race condition described on page 209 of Allessandro Rubini's "Linux Device Drivers". The appropriate fix is described in the book - basically, the sleep code needs a save_flags()/cli()/restore_flags() sequence inserted in the appropriate place. Note that there may well be other locations in the file with a similar problem - the only one that was affecting us was in the read routine, so we limited our patch to the sg_read.c routine. The latent bug most often manifested itself when reading back very short SCSI messages, at least on our system. Since the race condition would be highly dependent on overall system timing, it will affect different systems in different ways. Interestingly enough, the equivalent driver code under Slackware 3.5 (fairly old now) included the interrupt code. Looks like someone went and unintentionally cleaned out some critical code at some point. It also appears that the generic scsi driver has been significantly rewritten in the most recent kernel available at the RedHat site. Comments in the code indicate that logic exists to prevent race conditions in those regions that we observed them occurring. So the problem should already be fixed in future RedHat releases.
assigned to dledford
Current 2.2.14 kernels should not have this problem due to the rewritten sg driver mentioned in the original comments. Bug closed.