Bug 101555 - Kernel doesn't report write errors to applications
Kernel doesn't report write errors to applications
Status: CLOSED WONTFIX
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
8.0
i686 Linux
high Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-08-03 13:43 EDT by yuval yeret
Modified: 2005-10-31 17:00 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-30 11:41:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
rtest is an easy test to reproduce the problem. (440.16 KB, application/octet-stream)
2003-08-03 13:47 EDT, yuval yeret
no flags Details

  None (edit)
Description yuval yeret 2003-08-03 13:43:49 EDT
Description of problem:

Writes to block devices / filesystem mounted over block devices can fail if the 
block device is suddenly unreachable (e.g. link loss to RAID). Kernel should 
not acknowledge userspace synced writes in this situation, but surprisingly 
thats exactly what it does. 

Version-Release number of selected component (if applicable):
2.4.18-24 Redhat Errata kernel


How reproducible:
Every time. 


Steps to Reproduce:
1. from userspace code do synced writes to a block device file (e.g. /dev/sdd) 
which is actually a RAID LUN / disconnectable SCSI disk. use strace to track 
the write progress
2. disconnect cable / disk
3. keep tracking write progress
    
Actual results:
last write which started before the disconneciton is hung for some time. 
after a while that write returns ok, and all writes after that return ok as 
well.
after rebooting the machine the data is of course missing

Expected results:
1. first write which cannot complete due to disconnection should hang forever. 
(process should be in D state until a hard kill)
this is how SuSE kernel 2.4.20 behaves
2. better - return EIO or other error code to the userspace and let it 
handle/report the error as it sees fit. 

Additional info:
This was discussed in the linux-kernel mailing list some time ago:
[PATCH 2.4] Report write errors to applications - 
http://lists.insecure.org/lists/linux-kernel/2003/Jan/7178.html (contains 
suggested patch to marcello's 2.4.21pre3 kernel

http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21pre5aa1/
9999_fsync-msync-async-errors-1 - contains patch for the aa kernel
Comment 1 yuval yeret 2003-08-03 13:47:45 EDT
Created attachment 93368 [details]
rtest is an easy test to reproduce the problem. 

use synced writes

example usage:

rtest -filename=/dev/sdd -count=1000 -sync=1
Comment 2 Bugzilla owner 2004-09-30 11:41:23 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.