Red Hat Bugzilla – Bug 198749
Data corruption after IO error on swap
Last modified: 2014-08-11 01:40:18 EDT
When an IO error occurs while writing a page to swap, the io completion
functiton marks the page with SetPageError() but fails to re-mark the page with
Since the writeback was unsuccessful this is in error. The page may subsequently
be discarded from memory as it is now clean, resulting in incorrect data being
read when the page is later faulted back in.
In the read case, we need to check PageUptodate to ensure the IO completed
without error and return VM_FAULT_SIGBUS if it did not. In the write case, an
additional call to SetPageDirty() is placed immediately after the call to
This is fixed upstream by this changeset:
Customer reported the issue in the above IT and provided a patch based on this
Created attachment 132357 [details]
Patch to correct swap IO error handling
Created attachment 133147 [details]
upstream patch to solve the issue there. RHEL4 also needs the patch from the bk link
I'll take the upstream part, but the addition is not quite nice.
It really needs a big fat warning printed to user-space. Also the
way to keep the page dirty is not quite OK. New patch is against upstream, that
is, RHEL4 also needs the patch from the BK-link.
Created attachment 133307 [details]
update the warning and add it on the read side.
On request I also added the bio->bi_sector field to the warning, and added a
equivalent msg to the read side of things.
Could we get devel_ack here ? Peter created a fix patch and Fujitsu
has already verified it.
QE ack for 4.5.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
committed in stream U5 build 42.20. A test kernel with this patch is available
I can confirm that the patch is in, however I don't really have any to test
this. Do we have any test results from partners who might have equipment to
create disk errors etc?
We can test this using device-mapper. There is infrastructure for injecting
faults in a block device. Let me know if this is needed and I'll put something
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.