Bug 185782 - [RHEL4 U3] device-mapper mirror: Data corruption if the default mirror fails during recovery.
Summary: [RHEL4 U3] device-mapper mirror: Data corruption if the default mirror fails ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Alasdair Kergon
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 181409 186476
TreeView+ depends on / blocked
 
Reported: 2006-03-17 23:15 UTC by Kiyoshi Ueda
Modified: 2013-04-02 23:51 UTC (History)
11 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 22:46:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Don't switch default mirror if mirror set is out-of-sync (863 bytes, patch)
2006-03-29 16:27 UTC, Jun'ichi NOMURA
no flags Details | Diff
Read retry even if default isn't ok (1.35 KB, patch)
2006-03-29 16:36 UTC, Jun'ichi NOMURA
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Kiyoshi Ueda 2006-03-17 23:15:12 UTC
Description of problem:
Silent data corruption occurs if the default mirror fails during recovery.
It happens in this way:
  - During recovery, all mirrors can be out-of-sync except for
    the default mirror.
    (For RH_RECOVERING region, writes are done only to the default mirror.)
  - If the default mirror fails, the other mirror is chosen as new default.
  - As this point, writes done to the original default mirror are lost.


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 3) and create VG from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd as PVs
      - vg0 contains these 4 PVs
 2. Create a mirror LV and activate it.
      # lvcreate -L 200M -n lv0 -m 2 vg0
 3. Make filesystem on the mirror LV.
      # mke2fs -j /dev/mapper/vg0-lv0
 4. Disconnect the default mirror PV of the mirror LV.
    Example) If /dev/sda is used for the default mirror of the vg0-lv0:
      # echo offline > /sys/block/sda/device/state
    This step must be completed before the recovery has been finished.
 5. Wait the recovery has been finished.
 6. Remove the failed PV if it isn't automatically removed.
      # vgreduce --removemissing vg0
 7. Check if the filesystem is fine.
      # e2fsck -f /dev/mapper/vg0-lv0


Actual results:
e2fsck complains file system errors.
In general, out-of-sync device is used as new default mirror
and corrupts data silently.


Expected results:
e2fsck should not detect any error.
In general, data lost should not occur as long as possible.
If it's not possible, the mirror should emit error and stop operation.


Additional info:

Comment 1 Jun'ichi NOMURA 2006-03-29 16:27:12 UTC
Created attachment 127000 [details]
Don't switch default mirror if mirror set is out-of-sync

At least, we can avoid data corruption by not switching
default mirror when the set is out-of-sync.

To complete this fix, we need to fix error handling
during recovery (BZ#185785).

Additionally, to recover from failure of default mirror,
we need support for multiple master mirror.

Comment 2 Jun'ichi NOMURA 2006-03-29 16:36:15 UTC
Created attachment 127001 [details]
Read retry even if default isn't ok

When we don't allow to switch default mirror in case of out-of-sync,
there can be a situation that "default isn't ok but the region
is in-sync, so we can read from other mirror".

If default fails in out-of-sync mirror, we lost data anyway.
But reading as much data as possible may help salvaging.

This patch will do that.

Comment 5 Jason Baron 2006-05-09 17:11:28 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Red Hat Bugzilla 2006-08-10 22:46:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.