Bug 849243 - Systemd + udev failure with assembling non-perfect md raid array
Systemd + udev failure with assembling non-perfect md raid array
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: mdadm (Show other bugs)
17
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Jes Sorensen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-17 15:19 EDT by J.H.
Modified: 2013-05-03 04:04 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-05-03 04:04:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description J.H. 2012-08-17 15:19:36 EDT
Description of problem:

Software raid arrays are fairly common things, particularly on cheaper servers or when an entity wants the fastest possible I/O speeds (software raid is dramatically faster than hardware raid, it's just not as convenient)

With any type of raid, failure type scenarios exist, particularly ones that require a rebuild of the array:

- Drive going bad
- Growing array
- Drive getting erroneously ejected and re-added
- Bad intermediate connection
- etc

In these cases an array rebuild occurs, these can take a *VERY* long time, case in point:

3 disk raid 5 setup using 3T drives on sata III 6G interfaces, to a 4 disk array was estimating 1200 minutes to rebuild (or about a day to rebuild)

Should one reboot a machine, either intentionally or due to power failures, unrelated panics, etc the machine should reboot cleanly, assemble the now "degraded" array, and continue the rebuild. (this is the old, and obviously expected behavior)

This does not occur, the system now boots.  Systemd + udev find the array and attempt to assemble it, apparently notice it's degraded and fail to assmeble.  It then loops, re-discovers the array, attempts to assemble, notices it's degraded, assembly fails, re-discovers the array... ad infinum

Version-Release number of selected component (if applicable):


How reproducible:

Seems to be always

Steps to Reproduce:
1. take 4 disks of any size (large enough that an array rebuild will take a little while
2. Assemble a 3 disk array, raid 5.  Let this completely rebuild.
3. reboot
3. Add the 4th disk and grow the array
4. while the array rebuild is ongoing, reboot the machine (`reboot`)
  
Actual results:

udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397]

udevd[<pid>]: timeout: killing '/sbin/mdadm --offroot -I /dev/sda1' [344]

udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397]

udevd[<pid>]: timeout: killing '/sbin/mdadm --offroot -I /dev/sda1' [344]

udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397]

udevd[<pid>]: timeout: killing '/sbin/mdadm --offroot -I /dev/sda1' [344]

... onto infinity

Expected results:

System recognize the array is degraded, assemble accordingly and let array rebuild continue.

Additional info:
Comment 1 J.H. 2012-08-17 15:39:37 EDT
Ok additional detail, the system I'm running on is using encryption over the raid array (which took some doing)

However when it's asking for the encryption password, the raid information seems to indicate that's it's reassembled properly already and the rebuild will continue.  However when I just let things sit (no attempt to input the password)

the 

udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397]

stuff shows up.
Comment 2 Michal Schmidt 2012-08-20 05:27:38 EDT
systemd does not assemble RAID arrays. It's done from mdadm's udev rules. Reassigning to mdadm.
Comment 3 Jes Sorensen 2012-09-03 08:30:25 EDT
Ok a couple of questions here:

1) Please provide mdadm/systemd/dracut/kernel versions you see this with
2) Does this happen using your described scenario, ie. without encryption on
   top of the array?
3) Can you reproduce the problem _without_ the rebuild from growing the array,
   ie. if you just force a rebuild of the array over the 3 disks?

Thanks,
Jes
Comment 4 Jes Sorensen 2012-10-08 10:10:04 EDT
J.H.

Ping?
Comment 5 Jes Sorensen 2013-05-03 04:04:44 EDT
No reply to request for info for 6 months - closing

Note You need to log in before you can comment on or make changes to this bug.