Bug 1062309

Summary: System wouldn't boot after replacing a disk in a two-disk RAID1 array
Product: [Fedora] Fedora Reporter: David Howells <dhowells>
Component: dracutAssignee: dracut-maint-list
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 20CC: agk, dledford, dracut-maint-list, harald, Jes.Sorensen, jonathan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-29 15:02:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Howells 2014-02-06 15:59:20 UTC
Description of problem:

I have a system with two disks, each with four partitions.  The corresponding partitions on each disk are RAID1'd giving me four md devices.

One of the disks in the array died.  I added a new disk in a USB-attached SATA disk toaster and sync'd two of the RAID devices with the new disk, but decided it was just too slow over USB rather than SATA, so I went ahead and rebooted.

It started to boot, and after thinking about it for quite a few minutes, dracut came to the emergency mode shell complaining about a missing disk.  cat'ing /proc/mdstat showed only two of the four md devices present.

    dracut:/# cat /proc/mdstat
    Personalities : [raid1]
    md1 : active raid1 sdb2[2] sda2[0]
          102398908 blocks super 1.1 [2/2] [UU]
          bitmap: 0/1 pages [0KB], 65536KB chunk

    md0 : active raid1 sdb1[2] sda1[0]
          511988 blocks super 1.0 [2/2] [UU]

I manually assembled the md2 and md3 arrays with the -R flag then added the partitions on the new disk to them.  After sync'ing had completed, I rebooted.

The system came very slowly back to the same dracut prompt.  Viewing /proc/mdstat again showed only md0 and md1 - though I could manually assemble them with:

    dracut:/# mdadm --assemble /dev/md2
    mdadm: /dev/md2 has been started with 2 drives.
    dracut:/# mdadm --assemble /dev/md3
    mdadm: /dev/md3 has been started with 2 drives.

It seems very odd that it couldn't manage to reassemble them itself this time or after subsequent reboots (there were other problems too).

Version-Release number of selected component (if applicable):

mdadm-3.3-4.fc20.x86_64
dracut-034-64.git20131205.fc20.1.x86_64
systemd-208-9.fc20.x86_64
kernel-3.12.9-301.fc20.x86_64

Comment 1 Jes Sorensen 2014-02-08 09:06:27 UTC
I suspect this is actually a problem with dracut's handling of the raid
assembly, and not mdadm.

What are md2/md3 used for? I presume they have mount points?

What is in your /etc/mdadm.conf?

Thanks,
Jes

Comment 2 David Howells 2014-02-08 10:55:26 UTC
The contents of mdadm.conf:

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=f15b9681:b8f6126e:3108f1db:5daef8ec
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=d7919697:a7814731:82cdd3e4:60041633
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=ab7b2f77:613dcd9b:2db94b64:251c81c9
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=7c0679c6:5f4c4046:2feea17b:5667f7bd

In terms of mounting, md0, 1 & 3 are mounted directly:

/dev/md1 / ext4 rw,seclabel,relatime,data=ordered 0 0
/dev/md0 /boot ext4 rw,seclabel,relatime,data=ordered 0 0
/dev/md3 /data ext4 rw,seclabel,relatime,data=ordered 0 0

md2 is an encrypted partition and is mounted like this:

/dev/mapper/luks-3f0dbacb-1539-4172-89f8-f9d93951c6e8 /home ext4 rw,seclabel,relatime,data=ordered 0 0

Comment 3 Harald Hoyer 2014-04-02 08:50:54 UTC
What is your kernel command line?

What is the output of:
# dracut --print-cmdline

Did you try and regenerate the initramfs after you made your raid changes?
# dracut -f

Comment 4 David Howells 2014-04-02 10:54:06 UTC
(In reply to Harald Hoyer from comment #3)
> What is your kernel command line?
> 
> What is the output of:
> # dracut --print-cmdline

warthog>sudo dracut --print-cmdline
 rd.md.uuid=f15b9681:b8f6126e:3108f1db:5daef8ec  rd.md.uuid=d7919697:a7814731:82cdd3e4:60041633 resume=UUID=4caa0bf2-ab69-4e1d-8e14-ba63dded2689 resume=UUID=e369caf1-5d6f-4323-a7c8-825c146ae458  root=UUID=42892800-af7a-46a2-a5dd-6f2920d54955 rootflags=rw,relatime,seclabel,data=ordered rootfstype=ext4

> Did you try and regenerate the initramfs after you made your raid changes?
> # dracut -f

No.  How would that help?  The only change to the RAID was a change in the member disks; the RAID configuration didn't change at all as the physical components aren't actually listed there.  The problem only occurred once I'd physically replaced the dead disk with the new one - which required powering off the machine.  The next time I tried booting it these problems started occurring.

Comment 5 Fedora End Of Life 2015-05-29 10:50:06 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Fedora End Of Life 2015-06-29 15:02:57 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.