Bug 185785 - [RHEL4 U3] device-mapper mirror: Data corruption by temporal errors during recovery.
[RHEL4 U3] device-mapper mirror: Data corruption by temporal errors during re...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alasdair Kergon
:
Depends On:
Blocks: 181409 186476
  Show dependency treegraph
 
Reported: 2006-03-17 18:48 EST by Kiyoshi Ueda
Modified: 2013-04-02 19:51 EDT (History)
11 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 18:47:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kiyoshi Ueda 2006-03-17 18:48:50 EST
Description of problem:
Data corruption occurs by temporal errors during recovery,
because all errors during recovery are ignored.


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 2) and create VG from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc as PVs
      - vg0 contains these 3 PVs
 2. Create a mirror LV and activate it.
      # lvcreate -L 200M -n lv0 -m 1 vg0
 3. Make filesystem on the mirror LV.
      # mke2fs -j /dev/mapper/vg0-lv0
 4. Disconnect one of PVs used for the mirror LV.
      # echo offline > /sys/block/sdb/device/state
    This step must be completed before the recovery has been finished.
 5. Re-connect the PV.
      # echo running > /sys/block/sdb/device/state
 6. Wait the recovery has been finished.
 7. Check if the filesystem is fine.
    This check should be done many times because errors may not be
    detected by read balance.
      # while true; do e2fsck -f /dev/mapper/vg0-lv0; done


Actual results:
e2fsck complains file system errors, while there is no error is
recorded in kernel log and 'dmsetup status' shows no failure
on mirror.
This happens because the temporal PV failure from Step 4 through
Step 5 is ignored in the kernel.


Expected results:
e2fsck should not detect any error.
In the kernel side, errors during recovery should be handled.


Additional info:
Comment 1 Kiyoshi Ueda 2006-03-20 17:47:38 EST
Additional info:
The error handler should mark the failed device as "failed".
And the status of the region having errors in recovery should be
out-of-sync until the failed device is removed from the mirror map
or it is restored and recovered correctly.

If the status is "in-sync", other data corruption should occur
like below:
  - Errors by temporal device failure are detected during recovery.
  - The errors are handled and the failed device is marked as "failed",
    but corresponding regions are marked as "in-sync".
  - System down before the dmeventd takes action.
  - The temporal device failure becomes fine during system down.
  - Bootup and the mirror map is activated with "no error" and
    "no recovery".
Comment 4 Jason Baron 2006-05-09 13:14:09 EDT
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 7 Red Hat Bugzilla 2006-08-10 18:47:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.