From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1 Description of problem: This is the RHEL2.1 version of bugzilla 116900. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. run a test program which use syncronous I/O (O_SYNC). 2. during run time, storage array purposely injects errors by: * throwing away data and * returning CHECK CONDITION (ABORTED COMMAND) or CHECK CONDITION (HARDWARE ERROR) to OS. 3. the user mode application can't detect the error condition but keep going. Additional info: The RedHat support team has recreated this problem using a simpler user mode write test case combining with a kernel scsi error injection debug patch as the following: 1. Add a new ioctl to allow user mode programs to signal the kernel to start and stop the experiment. 2. Add a kernel debug patch that places trap code within scsi_softirq_handler() after the device interrupts OS for command completion. If the kerne is signalled (via ioctl) and if the target is our experiment device, the trap code will replace the SCSI status code as following: if (SCpnt->target == OurExperimentDevice) { SCpnt->result = 0x02; /* CHECK_CONDITION */ SCpnt->sense_buffer[0] = 0x70; /* sense valid */ SCpnt->sense_buffer[2] = 0xeb; /* ABORTED_COMMAND */ } 3. The replacement will always be performed until the kernel is signalled to stop (via ioctl). 4. The expectation is that during this interval, any read/write to this particular device would either be blocked or returned with error if file is opened with O_SYNC option. 5. We then write a simple test program and expect the write to be either blocked and/or returns with error. It, unfortunely, doesn't happen - write returns with success. From the kernel log (/var/log/messages), it can be seen that the driver retries 4 more times, then EXT2 does log the error condition but the error never gets propogated back to user mode appplication. Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0 --- replace status and sense data here Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 8000002 -- first error Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key Aborted Command Feb 19 22:15:47 perf82 kernel: I/O error: dev 08:22, sector 4152 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0 --- replace status and sense data here Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 8000002 -- 2nd error Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key Aborted Command Feb 19 22:15:47 perf82 kernel: I/O error: dev 08:22, sector 32 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0 --- replace status and sense data here Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 8000002 -- 3rd error Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key Aborted Command Feb 19 22:15:47 perf82 kernel: I/O error: dev 08:22, sector 0 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0 Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0 --- replace status and sense data here Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 8000002 -- 4th error Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key Aborted Command Feb 19 22:15:47 perf82 kernel: I/O error: dev 08:22, sector 0 Feb 19 22:15:47 perf82 kernel: EXT2-fs error (device sd(8,34)): ext2_write_inode: unable to read inode block - inode=13, block=4
This issue is beyond the scope of the current support status of RHEL2.1. No fix is planned.