Bug 198749 - Data corruption after IO error on swap
Data corruption after IO error on swap
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
high Severity medium
: ---
: ---
Assigned To: Peter Zijlstra
Brian Brock
:
Depends On:
Blocks: 198694 198868 215771 216194
  Show dependency treegraph
 
Reported: 2006-07-13 05:17 EDT by Bryn M. Reeves
Modified: 2014-08-11 01:40 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-07 22:40:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to correct swap IO error handling (1.27 KB, patch)
2006-07-13 05:17 EDT, Bryn M. Reeves
no flags Details | Diff
upstream patch to solve the issue there. RHEL4 also needs the patch from the bk link (1.52 KB, patch)
2006-07-27 09:03 EDT, Peter Zijlstra
no flags Details | Diff
update the warning and add it on the read side. (1.26 KB, patch)
2006-07-31 03:41 EDT, Peter Zijlstra
no flags Details | Diff

  None (edit)
Description Bryn M. Reeves 2006-07-13 05:17:16 EDT
IT# 97208
When an IO error occurs while writing a page to swap, the io completion
functiton marks the page with SetPageError() but fails to re-mark the page with
SetPageDirty().

Since the writeback was unsuccessful this is in error. The page may subsequently
be discarded from memory as it is now clean, resulting in incorrect data being
read when the page is later faulted back in.

In the read case, we need to check PageUptodate to ensure the IO completed
without error and return VM_FAULT_SIGBUS if it did not. In the write case, an
additional call to SetPageDirty() is placed immediately after the call to
SetPageError().

This is fixed upstream by this changeset:
http://linux.bkbits.net:8080/linux-2.6/cset@1.3031

Customer reported the issue in the above IT and provided a patch based on this
changeset.
Comment 1 Bryn M. Reeves 2006-07-13 05:17:17 EDT
Created attachment 132357 [details]
Patch to correct swap IO error handling
Comment 2 Peter Zijlstra 2006-07-27 09:03:47 EDT
Created attachment 133147 [details]
upstream patch to solve the issue there. RHEL4 also needs the patch from the bk link

I'll take the upstream part, but the addition is not quite nice.
It really needs a big fat warning printed to user-space. Also the
way to keep the page dirty is not quite OK. New patch is against upstream, that
is, RHEL4 also needs the patch from the BK-link.
Comment 4 Peter Zijlstra 2006-07-31 03:41:06 EDT
Created attachment 133307 [details]
update the warning and add it on the read side.

On request I also added the bio->bi_sector field to the warning, and added a
equivalent msg to the read side of things.
Comment 5 Keiichi Mori 2006-08-22 21:18:07 EDT
Could we get devel_ack here ? Peter created a fix patch and Fujitsu
has already verified it.

Comment 7 Jay Turner 2006-08-23 23:58:32 EDT
QE ack for 4.5.
Comment 8 RHEL Product and Program Management 2006-09-07 16:48:20 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 9 Jason Baron 2006-10-23 15:14:50 EDT
committed in stream U5 build 42.20. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 11 Mike Gahagan 2007-03-27 17:56:39 EDT
I can confirm that the patch is in, however I don't really have any to test
this. Do we have any test results from partners who might have equipment to
create disk errors etc?

Comment 12 Bryn M. Reeves 2007-03-28 16:23:33 EDT
We can test this using device-mapper. There is infrastructure for injecting
faults in a block device. Let me know if this is needed and I'll put something
together.
Comment 16 Red Hat Bugzilla 2007-05-07 22:40:50 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.