Bug 116900

Summary: RHEL3_U4 Data corruption in spite of using O_SYNC
Product: Red Hat Enterprise Linux 3 Reporter: Wendy Cheng <nobody+wcheng>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: halligan, jparadis, kmori, lwang, mwaite, mwesley, peterm, petrides, poelstra, riel, rkenna, rperkins, smorin, sunil.mushran, tao, tburke, yushio
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 14:20:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156321    
Attachments:
Description Flags
Patch submitted via IT#39672.
none
Fix for O_SYNC EIO propagation none

Description Wendy Cheng 2004-02-26 05:53:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
Hitachi reports data lost/corruption occurs in spite of using O_SYNC

GSS issue tracker number        :  34543
RedHat Technical Account Manager:  Keiichi Mori (kmori)
File System                     :  ext2

Original report:

Data corruption occurs in spite of using O_SYNC. Because kernel cannot
return write error to an application, the application keep the
transaction.
                                                                     
         
kernel: kernel-2.4.21-4.EL
filesystem: ext2
Reproducibility: everytime
Steps to reproduce:
1. run a test program which use syncronous I/O.
2. after about 15sec, storage array is returning an error during 2 minutes
  only for the write requests (read requests sucess).
  returned error: CHECK CONDITION (ABORTED COMMAND) or
                  CHECK CONDITION (HARDWARE ERROR)
3. the test program detected data corruption when test program
compared data.
                                                                     
         
Test program do:
 0. open file with O_SYNC.
 1. write data
 2. read data
 3. compare the data of 1. to the data of 2.
 4. repeat 1.-3.
 (Note: when kernel return an error for read() and write(),
        the test program stop.)

Test environment:
+---------+
| Server  | Fiber   +-----------+
|         |---------|           |
+---------+ Channel |Storage    |
                   |    Array  |
                   +-----------+
                                                                     
         
  start test program
    |
    |-----+-------------------+--------->
    |     |  returning error* |
    |     |<----------------->|
-----+-----+-------------------+------------->
    <-----><------------------>         elapsed time
     about   about 2min
     15sec
                                                                    


Version-Release number of selected component (if applicable):
2.4.21-9.EL

How reproducible:
Always

Steps to Reproduce:
1. run a test program which use syncronous I/O (O_SYNC).
2. during run time, storage array purposely injects errors by:
   * throwing away data and
   * returning CHECK CONDITION (ABORTED COMMAND) or
               CHECK CONDITION (HARDWARE ERROR) to OS.
3. the user mode application can't detect the error condition but keep
going.



Actual Results:  Application write system call returns success. 

Expected Results:  Application write should receive -EIO as the return
code.

Additional info:

The RedHat support team has recreated this problem using a simpler
user mode write test case combining with a kernel scsi error injection
debug patch as the following: 

1. Add a new ioctl to allow user mode programs to signal the kernel to
start and stop the experiment.
2. Add a kernel debug patch that places trap code within
scsi_softirq_handler() after the device interrupts OS for command
completion. If the kerne is signalled (via ioctl) and if the target is
our experiment device, the trap code will replace the SCSI status code
as following:

  if (SCpnt->target == OurExperimentDevice) {
      SCpnt->result = 0x02;   /* CHECK_CONDITION */
      SCpnt->sense_buffer[0] = 0x70; /* sense valid */
      SCpnt->sense_buffer[2] = 0xeb; /* ABORTED_COMMAND */
      }

3. The replacement will always be performed until the kernel is
signalled to stop (via ioctl).
4. The expectation is that during this interval, any read/write to
this particular device would either be blocked or returned with error
if file is opened with O_SYNC option.
5. We then write a simple test program and expect the write to be
either blocked and/or returns with error. It, unfortunely,  doesn't
happen - write returns with success.

From the kernel log (/var/log/messages), it can be seen that the
driver retries 4 more times, then EXT2 does log the error condition
but the error never gets propogated back to user mode appplication.

Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                           --- replace status and
sense data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002  -- first error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 4152
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                          --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002 -- 2nd error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 32
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                         --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002 -- 3rd error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: target=2,lun=0,channel=0
Feb 19 22:15:47 perf82 kernel: GSS_DEBUG: cmd->result=0,sb[0]=0,sb[2]=0
                                                                     
                                         --- replace status and sense
data here
Feb 19 22:15:47 perf82 kernel: SCSI disk error : host 0 channel 0 id 2
lun 0 return code = 8000002  -- 4th error
Feb 19 22:15:47 perf82 kernel: FMK EOM ILI Current sd08:22: sense key
Aborted Command
Feb 19 22:15:47 perf82 kernel:  I/O error: dev 08:22, sector 0
Feb 19 22:15:47 perf82 kernel: EXT2-fs error (device sd(8,34)):
ext2_write_inode: unable to read inode block - inode=13, block=4

Comment 1 Wendy Cheng 2004-02-26 06:01:52 UTC
Experiments ran on RedHat Performance Lab using aic7xxx controller. If
you need machine access, let me know.

Comment 2 Wendy Cheng 2004-03-08 22:27:58 UTC
There are places that the error return code within the file op paths
are not checked (e.g. the caller of ext2_update_inode()). However, the
most obvious one is probably the following:

linux-2.4.21-9.EL/mm/filemap.c:

   3313 ssize_t
   3314 do_generic_file_write(struct file *file,const char *buf,size_t
  count, loff_t *ppos)
   3315 {
          
   .......

   3411         if (status >= 0) {
   3412                 if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
   3413                         status = generic_osync_inode(inode,
OSYNC_METADATA|OSYNC_DATA);
   3414         }
   3415
   3416         err = written ? written : status;
   3417 out:
   3418
   3419         return err;

Note that the "status" is the (partial) number of bytes written when
the flow comes down to #3411 (so it's >=0). Since the f_flags is
O_SYNC, we go for generic_osync_inode() (which is mostly ok, except
the bottom portion of the code where the write_inode_now() doesn't
check the return code). With this particular test case, the #3413
returns -EIO but #3416 ignores this status.If we hack the code as:  t

#if defined(GSS_HACK)
        if (status) err = status;
        else err = written;
#else
        err = written ? written : status;
#endif

The test program show the OS correctly report "-1" with errno=5 back
to the user. 

In general, I see the clean up job could be quite messy (e.g. the 
struct inode_operations that has both "truncate" and "delete_inode"
prototyped as "void" - implies that there is no way to return the
error back to user mode). Not sure whether we really should fix this
though.




Comment 3 Wendy Cheng 2004-03-08 22:45:28 UTC
I didn't check the "fsync" path. However, hope the write(2) man page
could give you some flexibility (for not doing extensive clean up):

NOTES 
A successful return from write does not make any guarantee that data
has been committed to disk. In fact, on some buggy implementations, it
does not even guarantee that space has successfully been reserved for
the data. The only way to be sure is to call fsync(2) after you are
done writing all your data.

Comment 4 Wendy Cheng 2004-03-08 22:50:27 UTC
Well, my "GSS_HACK" above is not quite right - but I think you can get
the point.

Comment 5 Wendy Cheng 2004-06-11 21:11:23 UTC
Created attachment 101071 [details]
Patch submitted via IT#39672.

Comment 16 Stephen Tweedie 2004-09-16 20:01:43 UTC
Getting somewhere: I've put together a test environment for this,
using a modified loop device driver which allows me to inject
controlled errors into the device.

The very first attempt at that using the provided patch ran into an
unexpected problem:

  td_start_io: forcing read IO error on bh f649a610 07:00(0)
  lost async page write due to I/O error on 07:00

The first line is the "testdrive" driver forcing an error on a read
(this was a simple read from the device via "dd".)  The second line is
the patch's change to end_buffer_io_async complaining about a write error.

So clearly we've got some work to do here.  It's not immediately
obvious what the correct answer is here; end_buffer_io_async cannot
easily tell if it was triggered by a read or write request right now.
 But we probably do want to make that distinction somehow.  

I'll see if there's an easy way to divine it using buffer_req() and
buffer_new().

Comment 29 Josef Bacik 2005-01-07 20:46:28 UTC
Signed-off-by: Hisashi Hifumi  <hifumi.hisashi.co.jp>


diff -Nru linux-2.6.10-bk6/fs/buffer.c linux-2.6.10-bk6_fix/fs/buffer.c
--- linux-2.6.10-bk6/fs/buffer.c	2004-12-25 06:34:58.000000000 +0900
+++ linux-2.6.10-bk6_fix/fs/buffer.c	2005-01-04 19:58:48.000000000 +0900
@@ -311,10 +311,10 @@
  {
  	struct inode * inode = dentry->d_inode;
  	struct super_block * sb;
-	int ret;
+	int ret, err;

  	/* sync the inode to buffers */
-	write_inode_now(inode, 0);
+	ret = write_inode_now(inode, 0);

  	/* sync the superblock to buffers */
  	sb = inode->i_sb;
@@ -324,7 +324,9 @@
  	unlock_super(sb);

  	/* .. finally sync the buffers to disk */
-	ret = sync_blockdev(sb->s_bdev);
+	err = sync_blockdev(sb->s_bdev);
+	if (!ret)
+		ret = err;
  	return ret;
  }

diff -Nru linux-2.6.10-bk6/fs/fs-writeback.c
linux-2.6.10-bk6_fix/fs/fs-writeback.c
--- linux-2.6.10-bk6/fs/fs-writeback.c	2004-12-25 06:35:49.000000000 +0900
+++ linux-2.6.10-bk6_fix/fs/fs-writeback.c	2005-01-04
19:58:48.000000000 +0900
@@ -557,8 +557,9 @@
   *	dirty. This is primarily needed by knfsd.
   */

-void write_inode_now(struct inode *inode, int sync)
+int write_inode_now(struct inode *inode, int sync)
  {
+	int err = 0;
  	struct writeback_control wbc = {
  		.nr_to_write = LONG_MAX,
  		.sync_mode = WB_SYNC_ALL,
@@ -569,10 +570,11 @@

  	might_sleep();
  	spin_lock(&inode_lock);
-	__writeback_single_inode(inode, &wbc);
+	err = __writeback_single_inode(inode, &wbc);
  	spin_unlock(&inode_lock);
  	if (sync)
  		wait_on_inode(inode);
+	return err;
  }
  EXPORT_SYMBOL(write_inode_now);

@@ -641,8 +643,11 @@
  		need_write_inode_now = 1;
  	spin_unlock(&inode_lock);

-	if (need_write_inode_now)
-		write_inode_now(inode, 1);
+	if (need_write_inode_now) {
+		err2 = write_inode_now(inode, 1);
+		if (!err)
+			err = err2;
+	}
  	else
  		wait_on_inode(inode);

diff -Nru linux-2.6.10-bk6/fs/inode.c linux-2.6.10-bk6_fix/fs/inode.c
--- linux-2.6.10-bk6/fs/inode.c	2004-12-25 06:35:40.000000000 +0900
+++ linux-2.6.10-bk6_fix/fs/inode.c	2005-01-04 19:58:48.000000000 +0900
@@ -1035,7 +1035,7 @@
  		spin_unlock(&inode_lock);
  		if (!sb || (sb->s_flags & MS_ACTIVE))
  			return;
-		write_inode_now(inode, 1);
+		(void) write_inode_now(inode, 1);
  		spin_lock(&inode_lock);
  		inodes_stat.nr_unused--;
  		hlist_del_init(&inode->i_hash);
diff -Nru linux-2.6.10-bk6/fs/jbd/commit.c
linux-2.6.10-bk6_fix/fs/jbd/commit.c
--- linux-2.6.10-bk6/fs/jbd/commit.c	2004-12-25 06:35:27.000000000 +0900
+++ linux-2.6.10-bk6_fix/fs/jbd/commit.c	2005-01-04 19:58:48.000000000
+0900
@@ -341,6 +341,9 @@
  	}
  	spin_unlock(&journal->j_list_lock);

+	if (err)
+		__journal_abort_hard(journal);
+
  	journal_write_revoke_records(journal, commit_transaction);

  	jbd_debug(3, "JBD: commit phase 2\n"); diff -Nru
linux-2.6.10-bk6/include/linux/fs.h
linux-2.6.10-bk6_fix/include/linux/fs.h
--- linux-2.6.10-bk6/include/linux/fs.h	2004-12-25 06:34:27.000000000
+0900
+++ linux-2.6.10-bk6_fix/include/linux/fs.h	2005-01-04
19:58:48.000000000 +0900
@@ -1341,7 +1341,7 @@
  		invalidate_inode_pages(inode->i_mapping);
  }
  extern void invalidate_inode_pages2(struct address_space *mapping);
-extern void write_inode_now(struct inode *, int);
+extern int write_inode_now(struct inode *, int);
  extern int filemap_fdatawrite(struct address_space *);
  extern int filemap_flush(struct address_space *);
  extern int filemap_fdatawait(struct address_space *);






diff -Nru linux-2.4.29-pre3-bk2/fs/ext3/fsync.c
linux-2.4.29-pre3-bk2_fix/fs/ext3/fsync.c
--- linux-2.4.29-pre3-bk2/fs/ext3/fsync.c	2002-11-29
08:53:15.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/fs/ext3/fsync.c	2005-01-04
19:58:32.000000000 +0900
@@ -69,7 +69,7 @@
  	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
  		ret |= fsync_inode_data_buffers(inode);

-	ext3_force_commit(inode->i_sb);
+	ret |= ext3_force_commit(inode->i_sb);

  	return ret;
  }
diff -Nru linux-2.4.29-pre3-bk2/fs/ext3/super.c
linux-2.4.29-pre3-bk2_fix/fs/ext3/super.c
--- linux-2.4.29-pre3-bk2/fs/ext3/super.c	2004-11-17
20:54:21.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/fs/ext3/super.c	2005-01-04
19:58:32.000000000 +0900
@@ -1608,12 +1608,13 @@

  static int ext3_sync_fs(struct super_block *sb)
  {
+	int err;
  	tid_t target;

  	sb->s_dirt = 0;
  	target = log_start_commit(EXT3_SB(sb)->s_journal, NULL);
-	log_wait_commit(EXT3_SB(sb)->s_journal, target);
-	return 0;
+	err = log_wait_commit(EXT3_SB(sb)->s_journal, target);
+	return err;
  }

  /*
diff -Nru linux-2.4.29-pre3-bk2/fs/jbd/checkpoint.c
linux-2.4.29-pre3-bk2_fix/fs/jbd/checkpoint.c
--- linux-2.4.29-pre3-bk2/fs/jbd/checkpoint.c	2002-11-29
08:53:15.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/fs/jbd/checkpoint.c	2005-01-04 
19:58:32.000000000 +0900
@@ -142,7 +142,7 @@
  			spin_unlock(&journal_datalist_lock);
  			log_start_commit(journal, transaction);
  			unlock_journal(journal);
-			log_wait_commit(journal, tid);
+			(void) log_wait_commit(journal, tid);
  			goto out_return_1;
  		}

diff -Nru linux-2.4.29-pre3-bk2/fs/jbd/commit.c
linux-2.4.29-pre3-bk2_fix/fs/jbd/commit.c
--- linux-2.4.29-pre3-bk2/fs/jbd/commit.c	2004-02-18
22:36:31.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/fs/jbd/commit.c	2005-01-04
19:58:32.000000000 +0900
@@ -92,7 +92,7 @@
  	struct buffer_head *wbuf[64];
  	int bufs;
  	int flags;
-	int err;
+	int err = 0;
  	unsigned long blocknr;
  	char *tagp = NULL;
  	journal_header_t *header;
@@ -299,6 +299,8 @@
  			spin_unlock(&journal_datalist_lock);
  			unlock_journal(journal);
  			wait_on_buffer(bh);
+			if (unlikely(!buffer_uptodate(bh)))
+				err = -EIO;
  			/* the journal_head may have been removed now */
  			lock_journal(journal);
  			goto write_out_data;
@@ -326,6 +328,8 @@
  			spin_unlock(&journal_datalist_lock);
  			unlock_journal(journal);
  			wait_on_buffer(bh);
+			if (unlikely(!buffer_uptodate(bh)))
+				err = -EIO;
  			lock_journal(journal);
  			spin_lock(&journal_datalist_lock);
  			continue;	/* List may have changed */
@@ -351,6 +355,9 @@
  	}
  	spin_unlock(&journal_datalist_lock);

+	if (err)
+		__journal_abort_hard(journal);
+
  	/*
  	 * If we found any dirty or locked buffers, then we should have
  	 * looped back up to the write_out_data label.  If there weren't @@
-541,6 +548,8 @@
  		if (buffer_locked(bh)) {
  			unlock_journal(journal);
  			wait_on_buffer(bh);
+			if (unlikely(!buffer_uptodate(bh)))
+				err = -EIO;
  			lock_journal(journal);
  			goto wait_for_iobuf;
  		}
@@ -602,6 +611,8 @@
  		if (buffer_locked(bh)) {
  			unlock_journal(journal);
  			wait_on_buffer(bh);
+			if (unlikely(!buffer_uptodate(bh)))
+				err = -EIO;
  			lock_journal(journal);
  			goto wait_for_ctlbuf;
  		}
@@ -650,6 +661,8 @@
  		bh->b_end_io = journal_end_buffer_io_sync;
  		submit_bh(WRITE, bh);
  		wait_on_buffer(bh);
+		if (unlikely(!buffer_uptodate(bh)))
+			err = -EIO;
  		put_bh(bh);		/* One for getblk() */
  		journal_unlock_journal_head(descriptor);
  	}
@@ -661,6 +674,9 @@

  skip_commit: /* The journal should be unlocked by now. */

+	if (err)
+		__journal_abort_hard(journal);
+
  	/* Call any callbacks that had been registered for handles in this
  	 * transaction.  It is up to the callback to free any allocated
  	 * memory.
diff -Nru linux-2.4.29-pre3-bk2/fs/jbd/journal.c
linux-2.4.29-pre3-bk2_fix/fs/jbd/journal.c
--- linux-2.4.29-pre3-bk2/fs/jbd/journal.c	2004-11-17
20:54:21.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/fs/jbd/journal.c	2005-01-04 
19:58:32.000000000 +0900
@@ -582,8 +582,10 @@
   * Wait for a specified commit to complete.
   * The caller may not hold the journal lock.
   */
-void log_wait_commit (journal_t *journal, tid_t tid)
+int log_wait_commit (journal_t *journal, tid_t tid)
  {
+	int err = 0;
+
  	lock_kernel();
  #ifdef CONFIG_JBD_DEBUG
  	lock_journal(journal);
@@ -600,6 +602,12 @@
  		sleep_on(&journal->j_wait_done_commit);
  	}
  	unlock_kernel();
+
+	if (unlikely(is_journal_aborted(journal))) {
+		printk(KERN_EMERG "journal commit I/O error\n");
+		err = -EIO;
+	}
+	return err;
  }

  /*
@@ -1326,7 +1334,7 @@

  	/* Wait for the log commit to complete... */
  	if (transaction)
-		log_wait_commit(journal, transaction->t_tid);
+		err = log_wait_commit(journal, transaction->t_tid);

  	/* ...and flush everything in the log out to disk. */
  	lock_journal(journal);
diff -Nru linux-2.4.29-pre3-bk2/fs/jbd/transaction.c
linux-2.4.29-pre3-bk2_fix/fs/jbd/transaction.c
--- linux-2.4.29-pre3-bk2/fs/jbd/transaction.c	2004-08-08 
08:26:05.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/fs/jbd/transaction.c	2005-01-04 
19:58:32.000000000 +0900
@@ -1484,7 +1484,7 @@
  		 * to wait for the commit to complete.
  		 */
  		if (handle->h_sync && !(current->flags & PF_MEMALLOC))
-			log_wait_commit(journal, tid);
+			err = log_wait_commit(journal, tid);
  	}
  	kfree(handle);
  	return err;
@@ -1509,7 +1509,7 @@
  		goto out;
  	}
  	handle->h_sync = 1;
-	journal_stop(handle);
+	ret = journal_stop(handle);
  out:
  	unlock_kernel();
  	return ret;
diff -Nru linux-2.4.29-pre3-bk2/include/linux/jbd.h
linux-2.4.29-pre3-bk2_fix/include/linux/jbd.h
--- linux-2.4.29-pre3-bk2/include/linux/jbd.h	2004-11-17
20:54:22.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/include/linux/jbd.h	2005-01-04 
19:58:32.000000000 +0900
@@ -848,7 +848,7 @@

  extern int	log_space_left (journal_t *); /* Called with journal
locked */
  extern tid_t	log_start_commit (journal_t *, transaction_t *);
-extern void	log_wait_commit (journal_t *, tid_t);
+extern int	log_wait_commit (journal_t *, tid_t);
  extern int	log_do_checkpoint (journal_t *, int);

  extern void	log_wait_for_space(journal_t *, int nblocks);
diff -Nru linux-2.4.29-pre3-bk2/mm/filemap.c
linux-2.4.29-pre3-bk2_fix/mm/filemap.c
--- linux-2.4.29-pre3-bk2/mm/filemap.c	2004-11-17 20:54:22.000000000 +0900
+++ linux-2.4.29-pre3-bk2_fix/mm/filemap.c	2005-01-04
19:58:32.000000000 +0900
@@ -3268,7 +3268,12 @@
  			status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
  	}

-	err = written ? written : status;
+	/*
+	 * generic_osync_inode always returns 0 or negative value.
+	 * So 'status < written' is always true, and written should
+	 * be returned if status >= 0.
+	 */
+	err = (status < 0) ? status : written;
  out:

  	return err;

Comment 38 Michael Waite 2005-03-04 16:52:11 UTC
EMC asked me to see if I can help move this bugzilla entry along but I am
confused about the last posting:
I see kernels 2.6.10 and 2.4.29 referenced above. Which versions of Red Hat
Enterprise Linux are you running?

Comment 58 Stephen Tweedie 2005-07-12 19:41:26 UTC
Created attachment 116678 [details]
Fix for O_SYNC EIO propagation

Proposed patch for this problem.

Comment 59 Stephen Tweedie 2005-07-12 19:43:37 UTC
The earlier patches do not cover all cases; the problem is that any
__refile_buffer() can remove buffers that fsync_buffer_list() is waiting for, so
that IO errors on individual buffers get missed.  Propagating the error upwards
in __refile_buffer() itself works around this problem.

Comment 61 Ernie Petrides 2005-07-20 07:44:07 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-33.EL).


Comment 62 Ernie Petrides 2005-07-20 09:27:15 UTC
The fix has been bumped from the RHEL3 U6 patch pool
(from kernel version 2.4.21-33.EL) because it broke
the build.  We anticipate a revised fix later in U6.

Reverting to ASSIGNED state.


Comment 64 Ernie Petrides 2005-07-29 02:01:33 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-34.EL).


Comment 71 Red Hat Bugzilla 2005-09-28 14:20:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html