Bug 101555

Summary: Kernel doesn't report write errors to applications
Product: [Retired] Red Hat Linux Reporter: yuval yeret <yuval>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: riel, yuval
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:41:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rtest is an easy test to reproduce the problem. none

Description yuval yeret 2003-08-03 17:43:49 UTC
Description of problem:

Writes to block devices / filesystem mounted over block devices can fail if the 
block device is suddenly unreachable (e.g. link loss to RAID). Kernel should 
not acknowledge userspace synced writes in this situation, but surprisingly 
thats exactly what it does. 

Version-Release number of selected component (if applicable):
2.4.18-24 Redhat Errata kernel


How reproducible:
Every time. 


Steps to Reproduce:
1. from userspace code do synced writes to a block device file (e.g. /dev/sdd) 
which is actually a RAID LUN / disconnectable SCSI disk. use strace to track 
the write progress
2. disconnect cable / disk
3. keep tracking write progress
    
Actual results:
last write which started before the disconneciton is hung for some time. 
after a while that write returns ok, and all writes after that return ok as 
well.
after rebooting the machine the data is of course missing

Expected results:
1. first write which cannot complete due to disconnection should hang forever. 
(process should be in D state until a hard kill)
this is how SuSE kernel 2.4.20 behaves
2. better - return EIO or other error code to the userspace and let it 
handle/report the error as it sees fit. 

Additional info:
This was discussed in the linux-kernel mailing list some time ago:
[PATCH 2.4] Report write errors to applications - 
http://lists.insecure.org/lists/linux-kernel/2003/Jan/7178.html (contains 
suggested patch to marcello's 2.4.21pre3 kernel

http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21pre5aa1/
9999_fsync-msync-async-errors-1 - contains patch for the aa kernel

Comment 1 yuval yeret 2003-08-03 17:47:45 UTC
Created attachment 93368 [details]
rtest is an easy test to reproduce the problem. 

use synced writes

example usage:

rtest -filename=/dev/sdd -count=1000 -sync=1

Comment 2 Bugzilla owner 2004-09-30 15:41:23 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/