Bug 159765 - RHEL4 Data corruption in spite of using O_SYNC
Summary: RHEL4 Data corruption in spite of using O_SYNC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 156323
TreeView+ depends on / blocked
 
Reported: 2005-06-07 20:43 UTC by David Milburn
Modified: 2010-10-22 03:03 UTC (History)
3 users (show)

Fixed In Version: RHSA-2005-514
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-05 13:23:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
2.6.10 Fix O_SYNC speedup for generic_file_write_nolock patch (4.47 KB, patch)
2005-06-07 20:46 UTC, David Milburn
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:514 0 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 2 2005-10-05 04:00:00 UTC

Description David Milburn 2005-06-07 20:43:10 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
When the block device is opened with O_SYNC option, the write() system call
returns with 0 (success) even if the I/O error has been occured. For example,
even if the device has been forcely removed while writing to it, write()
system call returns with 0 (success). In such a case, write() system call
has to return with -1 (error) and EIO has to be set to errno.


Version-Release number of selected component (if applicable):
2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1. Connect USB pen drive to system.
2. # dd if=<any file is OK> of=aaa bs=512 count=10000   
3. # dd if=/dev/zero of=/dev/sdb1 bs=512 count=10000 (clear usb device)
4. # ./reproduce -i aaa -o /dev/sdb1
5. Remove the USB device while reproducer program is running
6. Note XXXX bytes copied.
7. Reboot
8. # dd if=/dev/sdb1 of=bbb bs=512 count=YYYY (YYYY is XXXX/512)
9. # dd if=aaa of=ccc bs=512 count=YYYY
10.# cmp bbb ccc
 
  

Actual Results:  bbb ccc differ

The last write is returning successfully eventhough
the data was not written to disk.

Expected Results:  The last write should return -1 and set errno to EIO,
bbb and ccc should contain same data.

Additional info:

RH support has not been able to reproduce the problem since applying
the 2.6.10 "Fix O_SYNC speedup for generic_file_write_nolock" to RHEL4; 
however, customer states they must run 2.6.12-rc2-mm3 to have complete 
success.

Comment 2 David Milburn 2005-06-07 20:46:35 UTC
Created attachment 115199 [details]
2.6.10 Fix O_SYNC speedup for generic_file_write_nolock patch

Comment 9 Stephen Tweedie 2005-07-07 17:50:39 UTC
I've verified this by running "verify-data"
(http://people.redhat.com/sct/src/verify-data/) in write-once O_SYNC mode
against a block device, forcing a reboot halfway and recording how far the write
proceeded, then doing a verify-data read to check that the data that far is
genuinely uptodate on disk.

Without the patch, this fails: the actual write progress on disk is several 10s
of MB behind the progress of the writing task.  With the patch, the writing
proceeds correctly as the task proceeds.

Comment 14 Red Hat Bugzilla 2005-10-05 13:23:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-514.html



Note You need to log in before you can comment on or make changes to this bug.