Bug 152002 - RHEL4 Write system call returns succes on partial writes
Summary: RHEL4 Write system call returns succes on partial writes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Eric Sandeen
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-03-24 05:10 UTC by Marty Wesley
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-29 15:34:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marty Wesley 2005-03-24 05:10:39 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
This is the RHEL4 version of bugzilla 116900.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. run a test program which use syncronous I/O (O_SYNC).
2. during run time, storage array purposely injects errors by:
   * throwing away data and
   * returning CHECK CONDITION (ABORTED COMMAND) or
               CHECK CONDITION (HARDWARE ERROR) to OS.
3. the user mode application can't detect the error condition but keep
going.  

Additional info:

The RedHat support team has recreated this problem using a simpler
user mode write test case combining with a kernel scsi error injection
debug patch as the following: 

1. Add a new ioctl to allow user mode programs to signal the kernel to
start and stop the experiment.
2. Add a kernel debug patch that places trap code within
scsi_softirq_handler() after the device interrupts OS for command
completion. If the kerne is signalled (via ioctl) and if the target is
our experiment device, the trap code will replace the SCSI status code
as following:

  if (SCpnt->target == OurExperimentDevice) {
      SCpnt->result = 0x02;   /* CHECK_CONDITION */
      SCpnt->sense_buffer[0] = 0x70; /* sense valid */
      SCpnt->sense_buffer[2] = 0xeb; /* ABORTED_COMMAND */
      }

3. The replacement will always be performed until the kernel is
signalled to stop (via ioctl).
4. The expectation is that during this interval, any read/write to
this particular device would either be blocked or returned with error
if file is opened with O_SYNC option.
5. We then write a simple test program and expect the write to be
either blocked and/or returns with error. It, unfortunely,  doesn't
happen - write returns with success.

From the kernel log (/var/log/messages), it can be seen that the
driver retries 4 more times, then EXT2 does log the error condition
but the error never gets propogated back to user mode appplication.

Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                           --- replace status and
sense data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002  -- first error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 4152
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                          --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002 -- 2nd error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 32
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                         --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002 -- 3rd error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                         --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002  -- 4th error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 0
Feb 19 22:15:47 perf82 kernel: EXT2-fs error (device sd(8,34)):
ext2_write_inode: unable to read inode block - inode=13, block=4

Comment 1 Tom Coughlan 2005-06-27 21:29:45 UTC
> This is the RHEL4 version of bugzilla 116900.

I don't see anything in bugzilla 116900 that says it will occur in RHEL 4. Maybe
that should be tested though.

Stephen, I'll assign this to you, since you own 116900. Assign it back to me if
it is something I should handle. 

Comment 2 Eric Sandeen 2007-08-28 22:03:20 UTC
> The RedHat support team has recreated this problem using a simpler
> user mode write test case combining with a kernel scsi error injection
> debug patch

Hmmmm 2.5 years later I don't suppose that patch is still around anywhere..? :)

Comment 3 Eric Sandeen 2007-08-28 23:29:38 UTC
This is actually expected.  Well, intended.  Perhaps not expected.  :)

If we look in generic_file_buffered_write() in mm/filemap.c:



        /*
         * For now, when the user asks for O_SYNC, we'll actually give O_DSYNC
         */
        if (likely(status >= 0)) { 
                if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
                        if (!a_ops->writepage || !is_sync_kiocb(iocb))
                                status = generic_osync_inode(inode, mapping,
                                                OSYNC_METADATA|OSYNC_DATA);
                }
        }


"For now" extents to current kernels as well, FWIW.

Also those 2 flags, OSYNC_METADATA|OSYNC_DATA, do *not* sync the inode.

So the inode gets written out only in writeback, long after the application has
returned.

Although I'm not really fond of it, and I'm not sure of the historical reasons
for it, I'm tempted to mark this NOTABUG because it's actually working as
designed...

-Eric

Comment 4 Eric Sandeen 2007-08-28 23:34:17 UTC
Oh, and for what it's worth, the data write itself probably *was* successful,
but your inode writeout was not.

Comment 5 Eric Sandeen 2007-08-29 15:34:02 UTC
I almost hate to do this, but because this is how Linux has been -
intentionally, it seems - for at least 5 or 6 years, I'm going to close this as
NOTABUG, because things are in fact working as designed and as intended.

-Eric


Note You need to log in before you can comment on or make changes to this bug.