Bug 166541

Summary: mdadm --grow infinite resync
Product: Red Hat Enterprise Linux 4 Reporter: Wendy Cheng <nobody+wcheng>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: coughlan, jbaron, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 21:15:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409, 185624    
Attachments:
Description Flags
upstream patch that fixes the issue. none

Description Wendy Cheng 2005-08-23 05:00:03 UTC
Description of problem:

Customer reports "mdadm --grow" command goes into an infinite loop of resync.

Version-Release number of selected component (if applicable):
2.6.9-16.EL

Steps to Reproduce:
0. Create 4 partitions, say sdc1, sdc2, sdc7, sdc8, where sdc7 and sdc8 are
larger than sdc1 and sdc2 (customer uses 53 GB vs. 19GB, I use 800MB vs. 1.5GB). 
1. mdadm -Cv /dev/md7 -l1 -n2 /dev/sdc1 /dev/sdc2 
2. mkfs.ext3 /dev/md7
3. mkdir /mnt/tmp 
4. mount /dev/md7 /mnt/tmp
5. mdadm /dev/md7 -f /dev/sdc1 (fail the device)
6. mdadm /dev/md7 -r /dev/sdc1 (remove the device)
7. mdadm /dev/md7 -a /dev/sdc7 (mirror to the bigger device, wait for sync to
complete)
8. mdadm /dev/md7 -f /dev/sdc2 (fail device)
9. mdadm /dev/md7 -r /dev/sdc2 (remove device)
10. mdadm /dev/md7 -a /dev/sdc8 (wait for sync)
11. mdadm --grow /dev/md7 -z size (say 200KB, make sure sdc7/8 has enough space)

The /proc/mdstat would show resync hangs.

Additional info:
Issue also discussed in:

http://ww w.issociate.de/board/post/233625/RAID_5_Grow.html

Comment 1 Wendy Cheng 2005-08-23 05:00:03 UTC
Created attachment 117988 [details]
upstream patch that fixes the issue.

Comment 4 Wendy Cheng 2005-08-23 18:36:00 UTC
Acknowledgment goes to Tom Callahan (the customer) who brought up this issue
(and patch) to Red Hat. 

Comment 8 Doug Ledford 2006-03-22 09:13:01 UTC
The patch posted here is not what was finally accepted upstream.  I'm making a
new patch that handles Stephen's questions and matches upstream.  Once testing
is complete, I'll post for review.

Comment 9 Doug Ledford 2006-03-23 22:15:18 UTC
I've completed my testing and the problem, as well as another related problem,
are now fixed.  I'm submitting the revised patch internally for review/inclusion
in the next update release.

Comment 10 Doug Ledford 2006-04-03 04:07:34 UTC
The one line change to the fit variable was accepted upstream, so this patch now
very closely mirrors the final upstream and has also been integrated into the
latest kernel builds.

Comment 11 Jason Baron 2006-04-03 17:50:06 UTC
committed in stream U4 build 34.11. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 13 Bob Johnson 2006-04-11 16:53:08 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.

Comment 17 Red Hat Bugzilla 2006-08-10 21:15:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html