Bug 159765

Summary: RHEL4 Data corruption in spite of using O_SYNC
Product: Red Hat Enterprise Linux 4 Reporter: David Milburn <dmilburn>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: aviro, davej, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-514 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-05 13:23:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156323    
Attachments:
Description Flags
2.6.10 Fix O_SYNC speedup for generic_file_write_nolock patch none

Description David Milburn 2005-06-07 20:43:10 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
When the block device is opened with O_SYNC option, the write() system call
returns with 0 (success) even if the I/O error has been occured. For example,
even if the device has been forcely removed while writing to it, write()
system call returns with 0 (success). In such a case, write() system call
has to return with -1 (error) and EIO has to be set to errno.


Version-Release number of selected component (if applicable):
2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1. Connect USB pen drive to system.
2. # dd if=<any file is OK> of=aaa bs=512 count=10000   
3. # dd if=/dev/zero of=/dev/sdb1 bs=512 count=10000 (clear usb device)
4. # ./reproduce -i aaa -o /dev/sdb1
5. Remove the USB device while reproducer program is running
6. Note XXXX bytes copied.
7. Reboot
8. # dd if=/dev/sdb1 of=bbb bs=512 count=YYYY (YYYY is XXXX/512)
9. # dd if=aaa of=ccc bs=512 count=YYYY
10.# cmp bbb ccc
 
  

Actual Results:  bbb ccc differ

The last write is returning successfully eventhough
the data was not written to disk.

Expected Results:  The last write should return -1 and set errno to EIO,
bbb and ccc should contain same data.

Additional info:

RH support has not been able to reproduce the problem since applying
the 2.6.10 "Fix O_SYNC speedup for generic_file_write_nolock" to RHEL4; 
however, customer states they must run 2.6.12-rc2-mm3 to have complete 
success.

Comment 2 David Milburn 2005-06-07 20:46:35 UTC
Created attachment 115199 [details]
2.6.10 Fix O_SYNC speedup for generic_file_write_nolock patch

Comment 9 Stephen Tweedie 2005-07-07 17:50:39 UTC
I've verified this by running "verify-data"
(http://people.redhat.com/sct/src/verify-data/) in write-once O_SYNC mode
against a block device, forcing a reboot halfway and recording how far the write
proceeded, then doing a verify-data read to check that the data that far is
genuinely uptodate on disk.

Without the patch, this fails: the actual write progress on disk is several 10s
of MB behind the progress of the writing task.  With the patch, the writing
proceeds correctly as the task proceeds.

Comment 14 Red Hat Bugzilla 2005-10-05 13:23:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-514.html