Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 198749

Summary:

Data corruption after IO error on swap

Product:

Red Hat Enterprise Linux 4

Reporter:

Bryn M. Reeves <bmr>

Component:

kernel

Assignee:

Peter Zijlstra <pzijlstr>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

high

Version:

4.0

CC:

jbaron, lwang

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0304

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-05-08 02:40:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

198694, 198868, 215771, 216194

Attachments:

Description	Flags
Patch to correct swap IO error handling	none
upstream patch to solve the issue there. RHEL4 also needs the patch from the bk link	none
update the warning and add it on the read side.	none

Description Bryn M. Reeves 2006-07-13 09:17:16 UTC

IT# 97208
When an IO error occurs while writing a page to swap, the io completion
functiton marks the page with SetPageError() but fails to re-mark the page with
SetPageDirty().

Since the writeback was unsuccessful this is in error. The page may subsequently
be discarded from memory as it is now clean, resulting in incorrect data being
read when the page is later faulted back in.

In the read case, we need to check PageUptodate to ensure the IO completed
without error and return VM_FAULT_SIGBUS if it did not. In the write case, an
additional call to SetPageDirty() is placed immediately after the call to
SetPageError().

This is fixed upstream by this changeset:
http://linux.bkbits.net:8080/linux-2.6/cset@1.3031

Customer reported the issue in the above IT and provided a patch based on this
changeset.

Comment 1 Bryn M. Reeves 2006-07-13 09:17:17 UTC

Created attachment 132357 [details]
Patch to correct swap IO error handling

Comment 2 Peter Zijlstra 2006-07-27 13:03:47 UTC

Created attachment 133147 [details]
upstream patch to solve the issue there. RHEL4 also needs the patch from the bk link

I'll take the upstream part, but the addition is not quite nice.
It really needs a big fat warning printed to user-space. Also the
way to keep the page dirty is not quite OK. New patch is against upstream, that
is, RHEL4 also needs the patch from the BK-link.

Comment 4 Peter Zijlstra 2006-07-31 07:41:06 UTC

Created attachment 133307 [details]
update the warning and add it on the read side.

On request I also added the bio->bi_sector field to the warning, and added a
equivalent msg to the read side of things.

Comment 5 Keiichi Mori 2006-08-23 01:18:07 UTC

Could we get devel_ack here ? Peter created a fix patch and Fujitsu
has already verified it.

Comment 7 Jay Turner 2006-08-24 03:58:32 UTC

QE ack for 4.5.

Comment 8 RHEL Program Management 2006-09-07 20:48:20 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Jason Baron 2006-10-23 19:14:50 UTC

committed in stream U5 build 42.20. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 11 Mike Gahagan 2007-03-27 21:56:39 UTC

I can confirm that the patch is in, however I don't really have any to test
this. Do we have any test results from partners who might have equipment to
create disk errors etc?

Comment 12 Bryn M. Reeves 2007-03-28 20:23:33 UTC

We can test this using device-mapper. There is infrastructure for injecting
faults in a block device. Let me know if this is needed and I'll put something
together.

Comment 16 Red Hat Bugzilla 2007-05-08 02:40:50 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html