Description of problem: Software raid arrays are fairly common things, particularly on cheaper servers or when an entity wants the fastest possible I/O speeds (software raid is dramatically faster than hardware raid, it's just not as convenient) With any type of raid, failure type scenarios exist, particularly ones that require a rebuild of the array: - Drive going bad - Growing array - Drive getting erroneously ejected and re-added - Bad intermediate connection - etc In these cases an array rebuild occurs, these can take a *VERY* long time, case in point: 3 disk raid 5 setup using 3T drives on sata III 6G interfaces, to a 4 disk array was estimating 1200 minutes to rebuild (or about a day to rebuild) Should one reboot a machine, either intentionally or due to power failures, unrelated panics, etc the machine should reboot cleanly, assemble the now "degraded" array, and continue the rebuild. (this is the old, and obviously expected behavior) This does not occur, the system now boots. Systemd + udev find the array and attempt to assemble it, apparently notice it's degraded and fail to assmeble. It then loops, re-discovers the array, attempts to assemble, notices it's degraded, assembly fails, re-discovers the array... ad infinum Version-Release number of selected component (if applicable): How reproducible: Seems to be always Steps to Reproduce: 1. take 4 disks of any size (large enough that an array rebuild will take a little while 2. Assemble a 3 disk array, raid 5. Let this completely rebuild. 3. reboot 3. Add the 4th disk and grow the array 4. while the array rebuild is ongoing, reboot the machine (`reboot`) Actual results: udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397] udevd[<pid>]: timeout: killing '/sbin/mdadm --offroot -I /dev/sda1' [344] udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397] udevd[<pid>]: timeout: killing '/sbin/mdadm --offroot -I /dev/sda1' [344] udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397] udevd[<pid>]: timeout: killing '/sbin/mdadm --offroot -I /dev/sda1' [344] ... onto infinity Expected results: System recognize the array is degraded, assemble accordingly and let array rebuild continue. Additional info:
Ok additional detail, the system I'm running on is using encryption over the raid array (which took some doing) However when it's asking for the encryption password, the raid information seems to indicate that's it's reassembled properly already and the rebuild will continue. However when I just let things sit (no attempt to input the password) the udevd[<pid>]: timeout: killing '/sbin/mdadm --detail --export /dev/md#' [397] stuff shows up.
systemd does not assemble RAID arrays. It's done from mdadm's udev rules. Reassigning to mdadm.
Ok a couple of questions here: 1) Please provide mdadm/systemd/dracut/kernel versions you see this with 2) Does this happen using your described scenario, ie. without encryption on top of the array? 3) Can you reproduce the problem _without_ the rebuild from growing the array, ie. if you just force a rebuild of the array over the 3 disks? Thanks, Jes
J.H. Ping?
No reply to request for info for 6 months - closing