Red Hat Bugzilla – Bug 456575
Mirror corruption after one of three legs fail simultaneously on more than 1 mirror
Last modified: 2011-01-13 17:48:56 EST
Mirror corruption issues where found in the cluster logging code and fixed in 5.3. During the investigation, there were other issues identified. So, there is still a problem in the kernel. It does not need fixing until device-mapper mirror failures are handled differently (which is planned for the future). Currently, when a mirror device fails, it is removed. Later releases will only remove the failed device if the failure is persistent.
Description of what will cause the failure:
In drivers/md/dm-raid1.c, after a leg fails and a write returns, '__bio_mark_nosync' is used to mark the region out-of-sync. This state is stored in a region structure that remains in the region hash. It is not removed from the region hash until the mirror is destroyed because it never goes on the clean_regions list. Right now, this is not a problem because when a device fails, the mirror is destroyed and a new mirror is created w/o the failed device. In the future, when we wish to handle transient failures, we would simply suspend and resume to restart recovery. In that case, some machines in the cluster would only write to the primary for regions that are cached as not-in-sync - due to the '__bio_mark_nosync'. The fix is to simply clear out the region hash when a mirror is suspended.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Updating PM score.
Handling of device-mapper mirror failures has not changed, and therefore, no change is required for kernel code at this time. Pushing out. (See comment #1 for more detail.)
I'll try to figure this out. These bugs take a long time to decipher, so it will have to be 'conditional nack - capacity' vs. devel_ack. Note the configuration when setting severity/priority scores.
Please verify this bug still exists with latest rhel5.5 kernel and userspace packages.... Many things have changed which would have a direct impact on this bug:
1) kernel handles write failures differently now
2) userspace cleans up LVs on an individual basis now vs on a VG scale
This bug is no longer reproducible with the latest rpms. Marking verified.
lvm2-2.02.74-1.el5 BUILT: Fri Oct 15 10:26:21 CDT 2010
lvm2-cluster-2.02.74-1.el5 BUILT: Fri Oct 15 10:27:02 CDT 2010
device-mapper-1.02.55-1.el5 BUILT: Fri Oct 15 06:15:55 CDT 2010
cmirror-1.1.39-10.el5 BUILT: Wed Sep 8 16:32:05 CDT 2010
kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
A data corruption may have occurred when using 3 or more mirrors. With this update, the underlying cluster code has been modified to address this issue, and the data corruption no longer occurs.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.