Bug 185785 - [RHEL4 U3] device-mapper mirror: Data corruption by temporal errors during recovery.
Summary: [RHEL4 U3] device-mapper mirror: Data corruption by temporal errors during re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 181409 186476
TreeView+ depends on / blocked
 
Reported: 2006-03-17 23:48 UTC by Kiyoshi Ueda
Modified: 2013-04-02 23:51 UTC (History)
11 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 22:47:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Kiyoshi Ueda 2006-03-17 23:48:50 UTC
Description of problem:
Data corruption occurs by temporal errors during recovery,
because all errors during recovery are ignored.


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 2) and create VG from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc as PVs
      - vg0 contains these 3 PVs
 2. Create a mirror LV and activate it.
      # lvcreate -L 200M -n lv0 -m 1 vg0
 3. Make filesystem on the mirror LV.
      # mke2fs -j /dev/mapper/vg0-lv0
 4. Disconnect one of PVs used for the mirror LV.
      # echo offline > /sys/block/sdb/device/state
    This step must be completed before the recovery has been finished.
 5. Re-connect the PV.
      # echo running > /sys/block/sdb/device/state
 6. Wait the recovery has been finished.
 7. Check if the filesystem is fine.
    This check should be done many times because errors may not be
    detected by read balance.
      # while true; do e2fsck -f /dev/mapper/vg0-lv0; done


Actual results:
e2fsck complains file system errors, while there is no error is
recorded in kernel log and 'dmsetup status' shows no failure
on mirror.
This happens because the temporal PV failure from Step 4 through
Step 5 is ignored in the kernel.


Expected results:
e2fsck should not detect any error.
In the kernel side, errors during recovery should be handled.


Additional info:

Comment 1 Kiyoshi Ueda 2006-03-20 22:47:38 UTC
Additional info:
The error handler should mark the failed device as "failed".
And the status of the region having errors in recovery should be
out-of-sync until the failed device is removed from the mirror map
or it is restored and recovered correctly.

If the status is "in-sync", other data corruption should occur
like below:
  - Errors by temporal device failure are detected during recovery.
  - The errors are handled and the failed device is marked as "failed",
    but corresponding regions are marked as "in-sync".
  - System down before the dmeventd takes action.
  - The temporal device failure becomes fine during system down.
  - Bootup and the mirror map is activated with "no error" and
    "no recovery".


Comment 4 Jason Baron 2006-05-09 17:14:09 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 7 Red Hat Bugzilla 2006-08-10 22:47:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.