Bug 152001

Summary: RHEL2.1 Write system call returns succes on partial writes
Product: Red Hat Enterprise Linux 2.1 Reporter: Marty Wesley <mwesley>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: jbaron, peterm, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-12-10 00:10:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marty Wesley 2005-03-24 05:07:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
This is the RHEL2.1 version of bugzilla 116900.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. run a test program which use syncronous I/O (O_SYNC).
2. during run time, storage array purposely injects errors by:
   * throwing away data and
   * returning CHECK CONDITION (ABORTED COMMAND) or
               CHECK CONDITION (HARDWARE ERROR) to OS.
3. the user mode application can't detect the error condition but keep
going.  

Additional info:

The RedHat support team has recreated this problem using a simpler
user mode write test case combining with a kernel scsi error injection
debug patch as the following: 

1. Add a new ioctl to allow user mode programs to signal the kernel to
start and stop the experiment.
2. Add a kernel debug patch that places trap code within
scsi_softirq_handler() after the device interrupts OS for command
completion. If the kerne is signalled (via ioctl) and if the target is
our experiment device, the trap code will replace the SCSI status code
as following:

  if (SCpnt->target == OurExperimentDevice) {
      SCpnt->result = 0x02;   /* CHECK_CONDITION */
      SCpnt->sense_buffer[0] = 0x70; /* sense valid */
      SCpnt->sense_buffer[2] = 0xeb; /* ABORTED_COMMAND */
      }

3. The replacement will always be performed until the kernel is
signalled to stop (via ioctl).
4. The expectation is that during this interval, any read/write to
this particular device would either be blocked or returned with error
if file is opened with O_SYNC option.
5. We then write a simple test program and expect the write to be
either blocked and/or returns with error. It, unfortunely,  doesn't
happen - write returns with success.

From the kernel log (/var/log/messages), it can be seen that the
driver retries 4 more times, then EXT2 does log the error condition
but the error never gets propogated back to user mode appplication.

Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                           --- replace status and
sense data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002  -- first error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 4152
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                          --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002 -- 2nd error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 32
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                         --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002 -- 3rd error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                         --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002  -- 4th error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 0
Feb 19 22:15:47 perf82 kernel: EXT2-fs error (device sd(8,34)):
ext2_write_inode: unable to read inode block - inode=13, block=4

Comment 1 Jim Paradis 2005-12-10 00:10:45 UTC
This issue is beyond the scope of the current support status of RHEL2.1.  No fix
is planned.