Bug 813948

Summary:	DM RAID: Reintegrating RAID1 devices causes fullsync even when partial would do
Product:	Red Hat Enterprise Linux 6	Reporter:	Jonathan Earl Brassow <jbrassow>
Component:	kernel	Assignee:	Jonathan Earl Brassow <jbrassow>
Status:	CLOSED ERRATA	QA Contact:	Corey Marthaler <cmarthal>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.2	CC:	borgan, cmarthal
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-2.6.32-269.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-20 08:47:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jonathan Earl Brassow 2012-04-18 20:23:06 UTC

You can split-off RAID1 images temporarily with LVM.  Upon merging back the device into the array, only those changes that have been made to the array need to be sync'ed to the device.  However, a bug is causing the entire device to have to be resynced.

This bug also affect devices that only transiently fail (i.e. go missing for only a short period of time).

to reproduce the issue:
1) lvcreate --type raid1 -m2 -L 5G -n lv vg
2) # Wait for sync
3) lvconvert --splitmirrors 1 --trackchanges vg/lv
4) dd if=/dev/zero of=/dev/vg/lv bs=4M count=1
5) sync
6) lvconvert --merge vg/lv_rimage_2
*) Resync should take a few seconds - not minutes.  (It should be obvious if the entire device is being resynced vs. just the portions that have changed.)

Comment 1 Jonathan Earl Brassow 2012-04-18 20:25:07 UTC

Still iterating upstream over the proper fix....

See comments around patch 5of5:
https://www.redhat.com/archives/dm-devel/2012-April/msg00043.html

Comment 2 RHEL Program Management 2012-04-18 20:30:17 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 3 Jonathan Earl Brassow 2012-04-18 23:42:22 UTC

Below illustrates the success of the patch.  The script 'test_raid.sh' does the following:
1) create a RAID1 LV
2) wait for sync
3) split a mirror image with --trackchanges
4) writes the displayed amount to the original LV (this is the amount of space that will need to be recovered.)
5) merge the image back into the array
6) display the time taken to resync the changes.

Before the patch, we can see that it doesn't matter how much was written, it takes the full amount of time to resync the device because it always syncs the entire thing.  After the patch, we see that the times for recovery increase with the amount of data that was written before the device was merged back into the array.

Before patch:
[root@bp-01 ~]# ./test_raid.sh 
No I/O

real    0m0.013s
user    0m0.004s
sys     0m0.008s

4M of I/O

real    2m13.611s
user    0m0.292s
sys     0m1.230s

40M of I/O

real    2m12.612s
user    0m0.279s
sys     0m1.264s

400M of I/O

real    2m12.634s
user    0m0.301s
sys     0m1.244s

All I/O (4G)

real    2m16.784s
user    0m0.288s
sys     0m1.349s


After patch:
[root@bp-01 ~]# ./test_raid.sh 
No I/O

real    0m0.013s
user    0m0.005s
sys     0m0.008s

4M of I/O

real    0m1.027s
user    0m0.005s
sys     0m0.019s

40M of I/O

real    0m2.041s
user    0m0.010s
sys     0m0.026s

400M of I/O

real    0m14.187s
user    0m0.032s
sys     0m0.142s

All I/O (4G)

real    2m16.727s
user    0m0.290s
sys     0m1.280s

Comment 4 Jarod Wilson 2012-05-02 16:20:12 UTC

Patch(es) available on kernel-2.6.32-269.el6

Comment 8 Corey Marthaler 2012-06-14 22:25:26 UTC

Marking verified (SanityOnly) based on test results in comment #3.

Comment 10 errata-xmlrpc 2012-06-20 08:47:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html