Bug 1015204

Summary: dracut doesn't copy etc/mdadm.conf into the initramfs image
Product: [Fedora] Fedora Reporter: Alan Stern <stern>
Component: dracutAssignee: dracut-maint
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 18CC: dracut-maint, dwmw2, enrique.bonet, harald, h.reindl, info, Jes.Sorensen, jonathan, jr-redhatbugs2, kevin.hobbs.1, nerijus, ondrejj, tschweikle
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-05 23:25:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
System log for boot-up showing temporary failure to mount root filesystem
none
grub2.cfg configuration file
none
Patch to set the correct defaults for mdadmconf and lvmconf none

Description Alan Stern 2013-10-03 16:01:43 UTC
Description of problem:

My newly installed kernel can't boot.  The initramfs image created by the RPM install script (and also images I created by hand using mkinitrd and dracut) doesn't include the /etc/mdadm.conf file.  Since my root filesystem is on an md raid1 volume, the kernel can't find it.

I manually added /etc/mdadm.conf to the initramfs image.  The kernel was then able to create the proper volume, but still didn't mount it.  I'm not sure why; perhaps a delay needs to be added.  We may need to come back to this later...

The /etc/dracut.conf file is unmodified; the lines saying

    # install local /etc/mdadm.conf
    #mdadmconf="no"

are still commented out.  Therefore, by default, dracut should have put the file in the image.

Note: This seems to be similar to bug 921887.

Version-Release number of selected component (if applicable):

dracut-029-1.fc18.2.i686.  Before today's update, the installed package was dracut-018-60.git20120927.fc16.noarch, which worked okay.

How reproducible:  Always.

Comment 1 Jan ONDREJ 2013-10-04 11:57:43 UTC
Same problem here with same version of dracut. All systems with default dracut config and root on an md device don't boot after latest kernel update. :-(

Comment 2 Enrique V. Bonet Esteban 2013-10-04 16:15:58 UTC
I updated the kernel in nine computers with software RAID.

Seven computers boot, but the raid devices are changed, for example:

/dev/md0 -> /dev/md127
/dev/md1 -> /dev/md126
...

And two computers don't boot and appear the message:

dracut-initqueue[PID]: Warning: Could not boot.
dracut-initqueue[PID]: Warning /dev/md2 does not exist
...

I think that is the same problem, boot or not boot the computer.

Comment 3 Jan ONDREJ 2013-10-04 17:39:55 UTC
(In reply to Enrique V. Bonet Esteban from comment #2)
> I think that is the same problem, boot or not boot the computer.

You can try to check, how these 2/7 computers are using your filesystems. If you have LABEL=... or UUID=... definitions in grub's kernel command line and /etc/fstab, then they can boot with /dev/md127 too. If you have /dev/md0 in grub or fstab config, then they don't boot.

But I think this is always a bug, does not matter if they boot or don't.

Curious, that I can't see same problem with dracut-029-2.fc19.x86_64 on Fedora 19. Is this already fixed? I don't see anything in F19 package changelog.

Comment 4 Enrique V. Bonet Esteban 2013-10-04 18:26:41 UTC
Hi Jan,

The two computers that don't boot have /dev/md0 in grub.cfg file (old 
configuration). I change this value for UUID and the computers booting,
but they change too the raid devices /dev/md0 -> /dev/md127...

I think like you, this is a bug, boot or don't boot.

Thanks,

Enrique

Comment 5 Alan Stern 2013-10-10 15:39:35 UTC
Created attachment 810611 [details]
System log for boot-up showing temporary failure to mount root filesystem

Here is a system log extract from a boot-up attempt, after I manually added etc/mdadm.conf to the initramfs image.  At about the 2.0 second point, the system failed to mount the root filesystem and dropped into an emergency shell.  As you can see, the md5 device was only partially assembled at that time (the sda5 mirror was bound but not the sdb5 mirror).

After looking through the system status for a while, I exited the shell.  At that point (about 50.7 seconds) the md5 assembly finished.  That's where the root filesystem is; the system found it and the rest of the bootup proceded normally.

Questions:

   1. Why didn't the system wait for md5 to be fully assembled before trying to mount the root filesystem?

   2. Why did the assembly completion wait until after I exited the emergency shell?  Why didn't it go on while I was doing other things?

Comment 6 Harald Hoyer 2013-10-14 09:01:49 UTC
(In reply to Enrique V. Bonet Esteban from comment #4)
> Hi Jan,
> 
> The two computers that don't boot have /dev/md0 in grub.cfg file (old 
> configuration). I change this value for UUID and the computers booting,
> but they change too the raid devices /dev/md0 -> /dev/md127...
> 
> I think like you, this is a bug, boot or don't boot.
> 
> Thanks,
> 
> Enrique

Hmm, you must not specify md0 in grub.cfg. Always use UUID or LABEL.

And even specify: rd.md.uuid=<MDUUID> on the kernel command line.

Comment 7 Harald Hoyer 2013-10-14 09:06:35 UTC
(In reply to Alan Stern from comment #5)
> Created attachment 810611 [details]
> System log for boot-up showing temporary failure to mount root filesystem
> 
> Here is a system log extract from a boot-up attempt, after I manually added
> etc/mdadm.conf to the initramfs image.  At about the 2.0 second point, the
> system failed to mount the root filesystem and dropped into an emergency
> shell.  As you can see, the md5 device was only partially assembled at that
> time (the sda5 mirror was bound but not the sdb5 mirror).
> 
> After looking through the system status for a while, I exited the shell.  At
> that point (about 50.7 seconds) the md5 assembly finished.  That's where the
> root filesystem is; the system found it and the rest of the bootup proceded
> normally.
> 
> Questions:
> 
>    1. Why didn't the system wait for md5 to be fully assembled before trying
> to mount the root filesystem?
> 
>    2. Why did the assembly completion wait until after I exited the
> emergency shell?  Why didn't it go on while I was doing other things?

Seems like it had to "resync", because of previous usage without all parts.

Please always add the MD UUID to the kernel command line.

rd.md.uuid=<MD_UUID>

To find the MD_UUID, run:

# mdadm --detail --export <yourmddevice>  |grep -F MD_UUID

Comment 8 Jan ONDREJ 2013-10-14 09:10:34 UTC
Does not matter, what is really an problem, but ignoring mdadm.conf by dracut is still a problem. Harald, can you fix this?

(In reply to Harald Hoyer from comment #7)
> (In reply to Alan Stern from comment #5)
> Please always add the MD UUID to the kernel command line.
> 
> rd.md.uuid=<MD_UUID>

I think you should report this to anaconda to always add this parameter after installation.

Will this fix original problem for proper /dev/mdX device number mapping?

Comment 9 Enrique V. Bonet Esteban 2013-10-14 12:51:47 UTC
(In reply to Harald Hoyer from comment #6)
> (In reply to Enrique V. Bonet Esteban from comment #4)
> > Hi Jan,
> > 
> > The two computers that don't boot have /dev/md0 in grub.cfg file (old 
> > configuration). I change this value for UUID and the computers booting,
> > but they change too the raid devices /dev/md0 -> /dev/md127...
> > 
> > I think like you, this is a bug, boot or don't boot.
> > 
> > Thanks,
> > 
> > Enrique
> 
> Hmm, you must not specify md0 in grub.cfg. Always use UUID or LABEL.
> 
> And even specify: rd.md.uuid=<MDUUID> on the kernel command line.

I probe your solution on a computer with the problem, I have a RAID device
mapping:

md127 -> swap
md126 -> /home
md125 -> /tmp
md124 -> /

And I add to the kernel command line the option:

rd.md.uuid=a81e44e9:22c39a51:24501111:aaf04870

When a81e44e9:22c39a51:24501111:aaf04870 is the output of the command:

mdadm --detail --export /dev/md124  |grep -F MD_UUID

Reboot the system and the new RAID device mapping is:

md2 -> /tmp
md3 -> swap
md0 -> /home
md127 -> /

The root directory is assigned to an incorrect RAID device.

I attached the grub2.cfg file

Comment 10 Enrique V. Bonet Esteban 2013-10-14 12:54:06 UTC
Created attachment 812002 [details]
grub2.cfg configuration file

Comment 11 Jes Sorensen 2013-10-14 13:57:54 UTC
*** Bug 1018272 has been marked as a duplicate of this bug. ***

Comment 12 David Woodhouse 2013-10-14 14:00:21 UTC
(In reply to Harald Hoyer from comment #6)
> Hmm, you must not specify md0 in grub.cfg. Always use UUID or LABEL.

This is nonsense, and breaks backward compatibility. You can't just introduce a rule like that in the middle of a stable release. This was *working* before the updates, and is now broken.

Even if it *wasn't* unacceptable for force people into using mount-by-UUID in the general case, it would be utterly insane to do this in a stable update.

The RAID code has the 'preferred minor' facility for a reason. It should be honoured.

Comment 13 Alan Stern 2013-10-18 15:53:39 UTC
(In reply to Harald Hoyer from comment #7)
> (In reply to Alan Stern from comment #5)

> > Questions:
> > 
> >    1. Why didn't the system wait for md5 to be fully assembled before trying
> > to mount the root filesystem?
> > 
> >    2. Why did the assembly completion wait until after I exited the
> > emergency shell?  Why didn't it go on while I was doing other things?
> 
> Seems like it had to "resync", because of previous usage without all parts.

If that's true, it would mean there's another bug in dracut: When an MD drive containing the root filesystem needs a resync, the system should wait for the resync to finish before trying to mount the root.

I'm not currently able to reproduce the behavior shown in the attachment.  However, the fact that it has happened twice is disturbing; this system needs to be able to boot without an operator present at the console.

> Please always add the MD UUID to the kernel command line.
> 
> rd.md.uuid=<MD_UUID>

It already was there.

Harald, quit trying to dodge the issue.  The basic fact is very simple: Dracut has a bug -- it doesn't copy /etc/mdadm.conf into the initramfs image when it should.  Just fix the bug; I'm sure it will be easier to do that than to go around telling lots of people to put rd.md.uuid=<MD_UUID> in their kernel command lines.

I agree with David Woodhouse's comment about backward compatibility.  What would Linux say?

Comment 14 Alan Stern 2013-11-04 18:51:22 UTC
Created attachment 819310 [details]
Patch to set the correct defaults for mdadmconf and lvmconf

Look guys, this doesn't require a huge intellectual investment.  The attached patch fixes the problem for me.  Please consider including it in a bug-fix release of dracut.

Comment 15 Jordan Russell 2013-12-12 04:10:03 UTC
Same problem here with md0 now showing up as md12X, breaking boot.

(In reply to Harald Hoyer from comment #6)
> Hmm, you must not specify md0 in grub.cfg. Always use UUID or LABEL.

I thought it was considered dangerous to use UUID/LABEL with MD RAID-1.
If the RAID devices don't get assembled properly on boot (due to a misconfiguration or bug), then a search by filesystem UUID/LABEL could find a RAID-1 member partition and mount it directly, bypassing RAID. This would lead to the mirrored array becoming out-of-sync/corrupted.

Specifying md0, as I understand, ensures that scenario can never occur, so that's what I've always done and what I would like to keep doing.

Comment 16 Jordan Russell 2013-12-12 04:43:12 UTC
Comment #14 appears to suggest that mdadmconf previously defaulted to "yes", but after this update now defaults to "no". Is that correct?

If so, then would creating a file called /etc/dracut.conf.d/my-md.conf with the line:

mdadmconf="yes"

and then installing an updated kernel package (to generate a fresh initramfs) be enough to solve the md0->md12X renaming issue?

(I'm hesitant to experiment as I reboot my machines remotely, and already got burned once...)

Finally: Do F19 and F20 also break the MD device naming? i.e. Will I need mdadmconf="yes" in all future releases if I want a fixed "md0" name?

Comment 17 Fedora Update System 2013-12-12 11:34:00 UTC
dracut-029-1.fc18.3 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/dracut-029-1.fc18.3

Comment 18 Alan Stern 2013-12-12 17:56:59 UTC
Jordan: It's not just that mdadmconf previously defaulted to "yes".  The current documentation (man dracut.conf) still states that it defaults to "yes".

I did indeed create such a file as you suggested, and it fixed the immediate problem.  However, now I'm facing a different (although related) problem; maybe someone can suggest a solution.

This is probably the result of recent changes to the kernel, not dracut's fault at all.  Still, the easiest way to work around it seems to lie in the startup script.

As described in comment #5, my system tries to mount the md5 device, which contains the root filesystem, before it has been assembled.  Of course the mount fails, and the script drops into an emergency console shell.  Simply typing "exit" is enough to get things going again, but this means that unattended boots will get stuck and fail.

Can anyone suggest a simple way (like a boot command-line argument) to make the startup script pause for a few seconds before trying to mount the root device?  Or if the mount fails, retry it after a few seconds delay before dropping into an emergency shell?

Comment 19 Fedora Update System 2013-12-13 05:07:41 UTC
Package dracut-029-1.fc18.3:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dracut-029-1.fc18.3'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-23312/dracut-029-1.fc18.3
then log in and leave karma (feedback).

Comment 20 Enrique V. Bonet Esteban 2013-12-13 12:31:03 UTC
I update dracut, add mdadmconf="yes" in the file /etc/dracut.conf and I run
dracut -f command and reboot the system.

The root directory is assigned a the correct RAID device /dev/md1

Comment 21 Harald Hoyer 2013-12-13 13:44:59 UTC
(In reply to Enrique V. Bonet Esteban from comment #20)
> I update dracut, add mdadmconf="yes" in the file /etc/dracut.conf and I run
> dracut -f command and reboot the system.
> 
> The root directory is assigned a the correct RAID device /dev/md1

Please test the update _without_ adding anything anywhere. The update should make the "mdadmconf=yes" config file change obsolete.

Comment 22 Enrique V. Bonet Esteban 2013-12-13 17:49:33 UTC
Hi Harald,

I remove the line added (mdadmconf="yes") and run again dracut -f, reboot the
system and work fine.

The update solve the problem.

Thanks,

Enrique

Comment 23 SpuyMore 2013-12-16 22:32:11 UTC
I think this would fix my bug 1024015 as well, except for the fact that lvmconf defaults to no instead of yes so lvm.conf is not included in generated initramfs. Please fix that as well.

Comment 24 Jordan Russell 2013-12-17 03:05:08 UTC
Will an update be released for Fedora 19 as well, since it also ships dracut-029?
Or did this issue only ever affect the F18 package?

I'd like to know because I'm wondering whether I'll be able to safely upgrade from F18 to F19/F20 without the config setting, or if I'll be unable to boot again after the upgrade if I don't add it first.

Comment 25 Harald Hoyer 2013-12-17 09:23:37 UTC
(In reply to Jordan Russell from comment #24)
> Will an update be released for Fedora 19 as well, since it also ships
> dracut-029?
> Or did this issue only ever affect the F18 package?
> 
> I'd like to know because I'm wondering whether I'll be able to safely
> upgrade from F18 to F19/F20 without the config setting, or if I'll be unable
> to boot again after the upgrade if I don't add it first.

Only affects F18. In F19, the hostonly mode is the default.

Comment 26 Fedora End Of Life 2013-12-21 15:52:12 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 27 Nerijus Baliƫnas 2014-02-03 23:20:20 UTC
It's a PITA that dracut-029-1.fc18.3 has not made to the updates repository, as it fixes unbootable kernel with root on mdraid. lvm is still missing.

Comment 28 Harald Hoyer 2014-02-04 11:58:24 UTC
(In reply to Nerijus Baliƫnas from comment #27)
> It's a PITA that dracut-029-1.fc18.3 has not made to the updates repository,
> as it fixes unbootable kernel with root on mdraid. lvm is still missing.

# echo 'mdadmconf="yes"' > /etc/dracut.conf.d/my-md.conf
# echo 'lvmconf="yes"' > /etc/dracut.conf.d/my-lvm.conf
# dracut -f

Should fix your issue on F18.

Or update to F19 or F20.

Comment 29 Fedora End Of Life 2014-02-05 23:25:13 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 30 Harald Reindl 2014-08-16 11:40:59 UTC
the same existed on F19/F20 on 2 out of 8 machines
they are not "host-only" and i really don't get
why somebody stops to copy /etc/mdadm.conf into initrd