Description of problem: Customer reports "mdadm --grow" command goes into an infinite loop of resync. Version-Release number of selected component (if applicable): 2.6.9-16.EL Steps to Reproduce: 0. Create 4 partitions, say sdc1, sdc2, sdc7, sdc8, where sdc7 and sdc8 are larger than sdc1 and sdc2 (customer uses 53 GB vs. 19GB, I use 800MB vs. 1.5GB). 1. mdadm -Cv /dev/md7 -l1 -n2 /dev/sdc1 /dev/sdc2 2. mkfs.ext3 /dev/md7 3. mkdir /mnt/tmp 4. mount /dev/md7 /mnt/tmp 5. mdadm /dev/md7 -f /dev/sdc1 (fail the device) 6. mdadm /dev/md7 -r /dev/sdc1 (remove the device) 7. mdadm /dev/md7 -a /dev/sdc7 (mirror to the bigger device, wait for sync to complete) 8. mdadm /dev/md7 -f /dev/sdc2 (fail device) 9. mdadm /dev/md7 -r /dev/sdc2 (remove device) 10. mdadm /dev/md7 -a /dev/sdc8 (wait for sync) 11. mdadm --grow /dev/md7 -z size (say 200KB, make sure sdc7/8 has enough space) The /proc/mdstat would show resync hangs. Additional info: Issue also discussed in: http://ww w.issociate.de/board/post/233625/RAID_5_Grow.html
Created attachment 117988 [details] upstream patch that fixes the issue.
Acknowledgment goes to Tom Callahan (the customer) who brought up this issue (and patch) to Red Hat.
The patch posted here is not what was finally accepted upstream. I'm making a new patch that handles Stephen's questions and matches upstream. Once testing is complete, I'll post for review.
I've completed my testing and the problem, as well as another related problem, are now fixed. I'm submitting the revised patch internally for review/inclusion in the next update release.
The one line change to the fit variable was accepted upstream, so this patch now very closely mirrors the final upstream and has also been integrated into the latest kernel builds.
committed in stream U4 build 34.11. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html