Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 185782

Summary: [RHEL4 U3] device-mapper mirror: Data corruption if the default mirror fails during recovery.
Product: Red Hat Enterprise Linux 4 Reporter: Kiyoshi Ueda <kueda>
Component: kernelAssignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: agk, christophe.varoqui, coughlan, egoggin, jbaron, jbrassow, jnomura, lmb, mbroz, tao, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:46:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409, 186476    
Attachments:
Description Flags
Don't switch default mirror if mirror set is out-of-sync
none
Read retry even if default isn't ok none

Description Kiyoshi Ueda 2006-03-17 23:15:12 UTC
Description of problem:
Silent data corruption occurs if the default mirror fails during recovery.
It happens in this way:
  - During recovery, all mirrors can be out-of-sync except for
    the default mirror.
    (For RH_RECOVERING region, writes are done only to the default mirror.)
  - If the default mirror fails, the other mirror is chosen as new default.
  - As this point, writes done to the original default mirror are lost.


Version-Release number of selected component:
kernel-2.6.9-34.EL


How reproducible:
Always


Steps to Reproduce:
 1. Prepare some PVs (more than 3) and create VG from them.
    Example)
      - /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd as PVs
      - vg0 contains these 4 PVs
 2. Create a mirror LV and activate it.
      # lvcreate -L 200M -n lv0 -m 2 vg0
 3. Make filesystem on the mirror LV.
      # mke2fs -j /dev/mapper/vg0-lv0
 4. Disconnect the default mirror PV of the mirror LV.
    Example) If /dev/sda is used for the default mirror of the vg0-lv0:
      # echo offline > /sys/block/sda/device/state
    This step must be completed before the recovery has been finished.
 5. Wait the recovery has been finished.
 6. Remove the failed PV if it isn't automatically removed.
      # vgreduce --removemissing vg0
 7. Check if the filesystem is fine.
      # e2fsck -f /dev/mapper/vg0-lv0


Actual results:
e2fsck complains file system errors.
In general, out-of-sync device is used as new default mirror
and corrupts data silently.


Expected results:
e2fsck should not detect any error.
In general, data lost should not occur as long as possible.
If it's not possible, the mirror should emit error and stop operation.


Additional info:

Comment 1 Jun'ichi NOMURA 2006-03-29 16:27:12 UTC
Created attachment 127000 [details]
Don't switch default mirror if mirror set is out-of-sync

At least, we can avoid data corruption by not switching
default mirror when the set is out-of-sync.

To complete this fix, we need to fix error handling
during recovery (BZ#185785).

Additionally, to recover from failure of default mirror,
we need support for multiple master mirror.

Comment 2 Jun'ichi NOMURA 2006-03-29 16:36:15 UTC
Created attachment 127001 [details]
Read retry even if default isn't ok

When we don't allow to switch default mirror in case of out-of-sync,
there can be a situation that "default isn't ok but the region
is in-sync, so we can read from other mirror".

If default fails in out-of-sync mirror, we lost data anyway.
But reading as much data as possible may help salvaging.

This patch will do that.

Comment 5 Jason Baron 2006-05-09 17:11:28 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Red Hat Bugzilla 2006-08-10 22:46:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html