Bug 159765 - RHEL4 Data corruption in spite of using O_SYNC
RHEL4 Data corruption in spite of using O_SYNC
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks: 156323
  Show dependency treegraph
 
Reported: 2005-06-07 16:43 EDT by David Milburn
Modified: 2010-10-21 23:03 EDT (History)
3 users (show)

See Also:
Fixed In Version: RHSA-2005-514
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-05 09:23:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
2.6.10 Fix O_SYNC speedup for generic_file_write_nolock patch (4.47 KB, patch)
2005-06-07 16:46 EDT, David Milburn
no flags Details | Diff

  None (edit)
Description David Milburn 2005-06-07 16:43:10 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
When the block device is opened with O_SYNC option, the write() system call
returns with 0 (success) even if the I/O error has been occured. For example,
even if the device has been forcely removed while writing to it, write()
system call returns with 0 (success). In such a case, write() system call
has to return with -1 (error) and EIO has to be set to errno.


Version-Release number of selected component (if applicable):
2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1. Connect USB pen drive to system.
2. # dd if=<any file is OK> of=aaa bs=512 count=10000   
3. # dd if=/dev/zero of=/dev/sdb1 bs=512 count=10000 (clear usb device)
4. # ./reproduce -i aaa -o /dev/sdb1
5. Remove the USB device while reproducer program is running
6. Note XXXX bytes copied.
7. Reboot
8. # dd if=/dev/sdb1 of=bbb bs=512 count=YYYY (YYYY is XXXX/512)
9. # dd if=aaa of=ccc bs=512 count=YYYY
10.# cmp bbb ccc
 
  

Actual Results:  bbb ccc differ

The last write is returning successfully eventhough
the data was not written to disk.

Expected Results:  The last write should return -1 and set errno to EIO,
bbb and ccc should contain same data.

Additional info:

RH support has not been able to reproduce the problem since applying
the 2.6.10 "Fix O_SYNC speedup for generic_file_write_nolock" to RHEL4; 
however, customer states they must run 2.6.12-rc2-mm3 to have complete 
success.
Comment 2 David Milburn 2005-06-07 16:46:35 EDT
Created attachment 115199 [details]
2.6.10 Fix O_SYNC speedup for generic_file_write_nolock patch
Comment 9 Stephen Tweedie 2005-07-07 13:50:39 EDT
I've verified this by running "verify-data"
(http://people.redhat.com/sct/src/verify-data/) in write-once O_SYNC mode
against a block device, forcing a reboot halfway and recording how far the write
proceeded, then doing a verify-data read to check that the data that far is
genuinely uptodate on disk.

Without the patch, this fails: the actual write progress on disk is several 10s
of MB behind the progress of the writing task.  With the patch, the writing
proceeds correctly as the task proceeds.
Comment 14 Red Hat Bugzilla 2005-10-05 09:23:17 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-514.html

Note You need to log in before you can comment on or make changes to this bug.