Red Hat Bugzilla – Bug 185782
[RHEL4 U3] device-mapper mirror: Data corruption if the default mirror fails during recovery.
Last modified: 2013-04-02 19:51:29 EDT
Description of problem:
Silent data corruption occurs if the default mirror fails during recovery.
It happens in this way:
- During recovery, all mirrors can be out-of-sync except for
the default mirror.
(For RH_RECOVERING region, writes are done only to the default mirror.)
- If the default mirror fails, the other mirror is chosen as new default.
- As this point, writes done to the original default mirror are lost.
Version-Release number of selected component:
Steps to Reproduce:
1. Prepare some PVs (more than 3) and create VG from them.
- /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd as PVs
- vg0 contains these 4 PVs
2. Create a mirror LV and activate it.
# lvcreate -L 200M -n lv0 -m 2 vg0
3. Make filesystem on the mirror LV.
# mke2fs -j /dev/mapper/vg0-lv0
4. Disconnect the default mirror PV of the mirror LV.
Example) If /dev/sda is used for the default mirror of the vg0-lv0:
# echo offline > /sys/block/sda/device/state
This step must be completed before the recovery has been finished.
5. Wait the recovery has been finished.
6. Remove the failed PV if it isn't automatically removed.
# vgreduce --removemissing vg0
7. Check if the filesystem is fine.
# e2fsck -f /dev/mapper/vg0-lv0
e2fsck complains file system errors.
In general, out-of-sync device is used as new default mirror
and corrupts data silently.
e2fsck should not detect any error.
In general, data lost should not occur as long as possible.
If it's not possible, the mirror should emit error and stop operation.
Created attachment 127000 [details]
Don't switch default mirror if mirror set is out-of-sync
At least, we can avoid data corruption by not switching
default mirror when the set is out-of-sync.
To complete this fix, we need to fix error handling
during recovery (BZ#185785).
Additionally, to recover from failure of default mirror,
we need support for multiple master mirror.
Created attachment 127001 [details]
Read retry even if default isn't ok
When we don't allow to switch default mirror in case of out-of-sync,
there can be a situation that "default isn't ok but the region
is in-sync, so we can read from other mirror".
If default fails in out-of-sync mirror, we lost data anyway.
But reading as much data as possible may help salvaging.
This patch will do that.
committed in stream U4 build 34.26. A test kernel with this patch is available
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.