Bug 1026902 - Recovery of RAID0/10 volume does not continue after stopping and reassembling or reboot
Recovery of RAID0/10 volume does not continue after stopping and reassembling...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-05 10:42 EST by Pawel Baldysiak
Modified: 2013-11-24 09:33 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1014102
Environment:
Last Closed: 2013-11-24 09:33:09 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pawel Baldysiak 2013-11-05 10:42:32 EST
+++ This bug was initially created as a clone of Bug #1014102 +++

Description of problem:
If we stop an ongoing recovery of a RAID0 or RAID10 volume, it will not continue after reassembling. The volume will be in normal/active state as if recovery finished correctly. The same defect occurs if we reboot OS during ongoing recovery - recovery will not continue when OS boots.
It causes data corruption.

Version-Release number of selected component (if applicable):
mdadm-3.2.6-3.el6.x86_64 and
mdadm-3.2.6-4.el6.x86_64
with kernels:
kernel-2.6.32-407.el6.x86_64 or
kernel-2.6.32-415.el6.x86_64 or
kernel-2.6.32-419.el6.x86_64

How reproducible:
Always

Steps to reproduce:
1) Create a RAID1 volume with 2 disks:
# mdadm –C /dev/md/imsm –e imsm –n 2 /dev/sd[ab]
# mdadm –C /dev/md/raid1 –l 1 –n 2 /dev/sd[ab]
2) Wait for the resync to be finished
3) Add a new disk to the container
# mdadm --add /dev/md127 /dev/sdc
4) Turn off or fail one of disks of the volume (sda or sdb).
# mdadm -f /dev/md126 /dev/sda
5) Wait for the start of recovery
6) Stop the volume
# mdadm -Ss
7) Reassemble the volume
# mdadm -As
OR 
6) Reboot OS

Actual result:
Recovery does not continue after reassembling/reboot.
The volume is in normal/active state in mdstat, 
but there is a rebuild state written in metadata:

Personalities : [raid1] 
md126 : active raid1 sdb[1] sda[0]
      47185920 blocks super external:/md127/0 [2/2] [UU]
md127 : inactive sdb[1](S) sda[0](S)
      6306 blocks super external:imsm
unused devices: <none>

# mdadm -E /dev/sda | grep -A 17 raid1
     [raid1]:
     UUID : 99ddea0a:a99b6376:ab0ce862:6cb93b3a
     RAID Level : 1 <-- 1
     Members : 2 <-- 2
     Slots : [UU] <-- [_U]
     Failed disk : 0
     This Slot : 1
     Array Size : 94371840 (45.00 GiB 48.32 GB)
     Per Dev Size : 94371840 (45.00 GiB 48.32 GB)
     Sector Offset : 0
     Num Stripes : 368640
     Chunk Size : 64 KiB <-- 64 KiB
     Reserved : 0
     Migrate State : rebuild
     Map State : normal <-- degraded
     Checkpoint : 23040 (512)
     Dirty State : clean

Expected result:
Recovery continues after reassembling/reboot.

Additional info:
The defect reproduces with upstream kernel 3.11.1 and upstream mdadm too.
The defect does not reproduce with kernel-2.6.32-358.el6.x86_64.

--- Additional comment from Lukasz Dorau on 2013-10-07 10:40:48 EDT ---

We have found a fix for this bug. The attached patch has just been sent upstream:
http://marc.info/?l=linux-raid&m=138115595902520&w=4
Comment 1 Pawel Baldysiak 2013-11-05 10:49:26 EST
This fix is already upstream, but is not present in 3.11.7-200.fc19 kernel:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61e4947c99c4494336254ec540c50186d186150b
Comment 2 Josh Boyer 2013-11-05 11:01:44 EST
(In reply to pawel.baldysiak from comment #1)
> This fix is already upstream, but is not present in 3.11.7-200.fc19 kernel:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=61e4947c99c4494336254ec540c50186d186150b

Thank you for the pointer.  That patch was just queued for 3.11.8 this morning.  We'll look at getting it into Fedora soon.
Comment 3 Michele Baldessari 2013-11-24 06:06:06 EST
kernel 3.11.9-200.fc19.x86_64.rpm  on F19 and kernel-devel-3.11.9-300.fc20.x86_64
for F20 were released.

I guess we can close this one now?

Thanks,
Michele
Comment 4 Josh Boyer 2013-11-24 09:33:09 EST
Yep, thanks for the reminder.

Note You need to log in before you can comment on or make changes to this bug.