Bug 203241 - PATCH: mkinitrd does not create /dev/dm-x devices for dmraid causing total boot failure
Summary: PATCH: mkinitrd does not create /dev/dm-x devices for dmraid causing total bo...
Keywords:
Status: CLOSED DUPLICATE of bug 204768
Alias: None
Product: Fedora
Classification: Fedora
Component: mkinitrd
Version: rawhide
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Peter Jones
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-19 18:25 UTC by Hans de Goede
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-31 21:04:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
PATCH fixing dmraid booting in mkinitrd (416 bytes, patch)
2006-08-19 18:25 UTC, Hans de Goede
no flags Details | Diff
PATCH: dmraid boot initrd script workaround (404 bytes, patch)
2006-08-19 18:27 UTC, Hans de Goede
no flags Details | Diff

Description Hans de Goede 2006-08-19 18:25:36 UTC
A friend of mine (chabotc) has been having problems booting his
Dell XPS with the factory default raid setup (which he doesnot want to change
because he doesn't want to remove all his data) ever since the dmraid alignment
checks (bug 186842) were added to the kernel.

Since many people are suffering from the same problem and this gives Linux /
Fedora a bad name, I decided last week to go and try to fix this.

So I've borrowed his PC and now after 8 full hours of debugging I've found the
problem and have a fix for it (actually 2).

Rescent mkinitrd versions generate the correct:
rmparts sdb
rmparts sda
dm create nvidia_dabhfihh 0 976562432 striped 2 128 8:0 0 8:16 0
dm partadd nvidia_dabhfihh
lines (or similar lines for mother dmraid setups), the problem is that the dm
code in nash > 5.0.46 creates the nescesarry device nodes under /dev/mapper, but
doesn't create the matching /dev/dm-x device nodes.

With some searching I managed to find SRPMS for most versions of mkinitrd
between 5.0.46 (which my friend found out workes for him) and 5.1.9 . After
trying most of them it was determined that the problem was introduced between
5.0.46 and 5.0.47 . I've done a diff between these 2 versions and the problem is
the removal of a call to smartmknod() in block_sysfs_try_dir(). Readding this
call fixes this. Notice that an extra call to smartmknod() was added to
block_show_labels(), but that appearantly doesn't work in the dmraid case.

The attached mkinitrd-5.1.9-test.patch readds the removed line, fixing this on
my friends PC.

Another way of fixing this is adding an extra call to mkblkdevs to the init
script in the initrd after the "dm partadd xxxxx"

The attached mkinitrd.diff patch does this.

Please apply one if these 2 patches so that people with a dmraid setup can have
a booting FC-6, a non booting OS is sortoff bad PR.

Comment 1 Hans de Goede 2006-08-19 18:25:37 UTC
Created attachment 134511 [details]
PATCH fixing dmraid booting in mkinitrd

Comment 2 Hans de Goede 2006-08-19 18:27:49 UTC
Created attachment 134512 [details]
PATCH: dmraid boot initrd script workaround

Comment 3 Dwaine Garden 2006-08-20 06:08:16 UTC
Do we know when this patch will get into rawhide?

Comment 4 David Nielsen 2006-08-21 07:55:21 UTC
It might be worth mentioning that the issued updated kernel for FC5 does not
boot dmraid either as I most painfully discovered a few minutes ago.

This definitely needs to get fixed soon.

Comment 5 Ron Courtright 2006-08-22 17:33:28 UTC
I am desparate to get past this problem.  I confess that I have so far relied 
upon RPMs to update so I am hesitant about the patch process.  Would someone be 
so kind as to provide some directions here (or a pointer to some site that 
provides a walkthrough) as to the order and procedure of applying this patch?  
Thanks in advance.

Comment 6 Ron Courtright 2006-08-22 17:36:44 UTC
By the way, do both patches need to be applied?  The initial comment suggests 
that one might apply one OR the other.

Comment 7 Hans de Goede 2006-08-22 19:38:47 UTC
No either one will work. In your case the workaround rather then the real fix
has the advantage that it will work without a recompile.

Instructions:
Save the second attachmend as mkinitrd.diff
"cd /sbin"
"patch -p1 < [path-to]/mkinitrd.diff"
Where [path-to] should be replaced by the path to mkinitrd.diff

And then rerun mkinitrd:
"mkinitrd -f /boot/initrd-`uname-r`.img `uname-r`"

Notice that this patch and the entire diagnosis is based on FC6-test2 + rawhide
updates and may or may not apply to FC-5.

With this patch and mkinitrd >= 5.1.6 ("rpm -q mkinitrd" to find out) and no
usb-storage lines in /etc/modprobe.conf dmraid should work.



Comment 8 Ron Courtright 2006-08-22 22:08:54 UTC
I am in no man's land, since I am still in FC5 and my mkinitrd == 5.0.32-1. I 
suppose I will either have to wait for a solution for that version.

Anyway, thanks for the directions.  I hope your efforts help others who are 
stuck in this most annoying predicament.

Comment 9 Dwaine Garden 2006-08-23 03:10:02 UTC
Good news.  Just syncd with rawhide and the new kernel booted with dm raid0
without any modifications. Looks like nash has been fixed.

I'm using the via_sata driver.

Dwaine



Comment 10 Hans de Goede 2006-08-23 04:09:30 UTC
(In reply to comment #9)
> Good news.  Just syncd with rawhide and the new kernel booted with dm raid0
> without any modifications. Looks like nash has been fixed.
> 
> I'm using the via_sata driver.
> 

Hmm, I just checked my mirror and the nash there isn't fixed, maybe this bug
only applies to nv_sata using systems, although I have a hard time believing
that. Could you cat and paste or attach the contents of your /etc/fstab here?
Thanks!


Comment 11 Ron Courtright 2006-08-24 02:14:04 UTC
Would someone point me in the right direction to learning how I, too, can sync
FC5 to RawHide?  (Assuming that is possible; I also use via_sata.)  Thanks.

Comment 12 Hans de Goede 2006-08-24 07:51:42 UTC
(In reply to comment #11)
> Would someone point me in the right direction to learning how I, too, can sync
> FC5 to RawHide?  (Assuming that is possible; I also use via_sata.)  Thanks.

First of all this may make your system unbootable even with the older kernel!

Now with that said, edit:
/etc/yum.repos.d/fedora-core.repo

This file has 3 sections, of which only the top one is enabled by default, you
can see this because the top section contains the line:
enabled=1

Change this to:
enabled=0

If you've enabled any other sections yourself disable them too.

Do the the same for:
/etc/yum.repos.d/fedora-updates.repo
and:
/etc/yum.repos.d/fedora-extras.repo

Now edit /etc/yum.repos.d/fedora-development.repo
and enable the top secxtion, that is modify it so that it contains:
enabled=1

Do the same for:
/etc/yum.repos.d/fedora-extras-development.repo


Now your yum points to the development branch of Fedora. I think in this case it
is wise todo a piecemeal update as you're only interested in mkinitrd, so after
making the above changes type:
yum update mkinitrd

Do not use "yum -y update mkinitrd"!
Now once yum has done all the magic it will give a list of packages that it will
updater, this will include mkinitrd glibc(-xxx) and probably device-mapper and
mdraid, this list may be around 10 packages long if its much longer please post
it here and press N to stop yum from doing the actual update.

If you are comfortable with the list press Y to continue, and once yum is done
you've got the new mkinitrd which is al you need.

After this you may revert the changes to /etc/yum.repos.d/*, if you don't revert
this and do a yum update later you will get updated to a full development system!


Comment 13 Hans de Goede 2006-08-24 09:12:47 UTC
pjones,

Can we get some progress on this? Maybe I can inspire some confidence in the
validness of the attached patches / diagnosys of the problem by explaining how I
came to these conclusions:

As said a friend of mine has a Dell XPS, which default comes with nvidea sata
dmraid setup. With a kernel update some time ago this broke for him (and many
others). He had managed to manually fix this by adding the nescesarry "dm xxxx"
lines to the init scripts in his initrd, using an initrd generated by mkinitrd
5.0.46 as base.

Using later mkinitrd versions generated initrd's with the magic lines added
manually for the first few newer mkinitrd versions and added by mkinitrd itself
for later versions, his system broke once again.

So I started by collecting mkinitrd versions 5.0.46 - 5.1.9 and managed to find
most and by trial and error found out that this new breakage was introduced by
5.0.47, so 5.0.47 and newer do not work on his system even with the nescesarry
magic "dm xxxxxx" lines in place.

After pinpointing the exact version which broke I wanted to know where exactly
it broke, so I recompiled the Fedora busybox rpm to include the ash applet and I
inserted "busybox ash" lines between all the lines in the initrd init script.

This way I could closely observe the behaviour of nash / the init script during
the initrd stage of the boot.

This way I soon noticed that with 5.0.46 /dev/dm-x nodes showed up in /dev after
the magic "dm xxxxx" lines in the init script, whereas with 5.0.47 these didn't
show up. The missing of this devices in trun caused the "mkrootdev xxxxx" line
from the init script to fail, which in turn caused total boot failure.

I could fix the boot with 5.0.47 (and later) by doing a manual mknod from ash
for either /dev/dm-x or /dev/root .

Then I first try to rerun mkblkdevs after the "dm xxxxx" lines, which worked but
didn't seem pretty (this is what the second attached patch does). So I did a
"diff -ur" between the sources of 5.0.46 and 5.0.47 (huge diff, many internal
changes) and found the removal of the mksmartnods call which is readded in the
first attached patch, which fixes this in a less ugly way.

I hope that explains to how I came to this patches and why one of these patches
is needed. Now PLEASE apply one of these before FC-6 so that people with a
similar setup can have a working system out of the box.


Comment 14 Ron Courtright 2006-08-24 17:09:38 UTC
I hit a roadblock on the yum update mkinitrd trail.  Here is the output of that 
action:

Loading "installonlyn" plugin
Setting up Update Process
Setting up repositories
livna                                                                [1/6]
extras-development                                                   [2/6]
development                                                          [3/6]
gst-0.10-apps                                                        [4/6]
gst-0.10-deps                                                        [5/6]
gst-0.10-gst                                                         [6/6]
Reading repository metadata in from local files
Resolving Dependencies
--> Populating transaction set with selected packages. Please wait.
---> Downloading header for mkinitrd to pack into transaction set.
mkinitrd-5.1.9-1.x86_64.r 100% |=========================|  49 kB    00:05
---> Package mkinitrd.x86_64 0:5.1.9-1 set to be updated
--> Running transaction check
--> Processing Dependency: rtld(GNU_HASH) for package: mkinitrd
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
---> Downloading header for glibc to pack into transaction set.
glibc-2.4.90-23.x86_64.rp 100% |=========================| 135 kB    00:15
---> Package glibc.x86_64 0:2.4.90-23 set to be updated
--> Running transaction check
--> Processing Dependency: glibc-common = 2.4.90-23 for package: glibc
--> Processing Conflict: glibc-common conflicts glibc > 2.4
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
---> Downloading header for glibc-common to pack into transaction set.
glibc-common-2.4.90-23.x8 100% |=========================| 707 kB    01:31
---> Package glibc-common.x86_64 0:2.4.90-23 set to be updated
--> Running transaction check
--> Processing Dependency: glibc-common = 2.4-8 for package: glibc
--> Processing Conflict: glibc-common conflicts glibc < 2.4.90
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
---> Downloading header for glibc to pack into transaction set.
glibc-2.4.90-23.i686.rpm  100% |=========================| 134 kB    00:18
---> Package glibc.i686 0:2.4.90-23 set to be updated
--> Running transaction check
--> Processing Dependency: glibc = 2.4-8 for package: glibc-headers
--> Processing Dependency: glibc = 2.4-8 for package: glibc-devel
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
---> Downloading header for glibc-devel to pack into transaction set.
glibc-devel-2.4.90-23.x86 100% |=========================| 100 kB    00:14
---> Package glibc-devel.x86_64 0:2.4.90-23 set to be updated
---> Downloading header for glibc-headers to pack into transaction set.
glibc-headers-2.4.90-23.x 100% |=========================| 133 kB    00:14
---> Package glibc-headers.x86_64 0:2.4.90-23 set to be updated
--> Running transaction check
--> Processing Dependency: glibc-headers = 2.4-8 for package: glibc-devel
--> Processing Dependency: glibc = 2.4-8 for package: glibc-devel
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
--> Running transaction check
--> Processing Dependency: glibc-headers = 2.4-8 for package: glibc-devel
--> Processing Dependency: glibc = 2.4-8 for package: glibc-devel
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
--> Running transaction check
Error: Unable to satisfy dependencies
Error: Package glibc-devel needs glibc-headers = 2.4-8, this is not available.
Error: Package glibc-devel needs glibc = 2.4-8, this is not available.

So, first let me apologize for being such a tyro, but here I am.  Do I need to 
go back and get these earlier versions of glibc?  Or is this a hiding for 
nothing?  Thanks to all.

Comment 15 Hans de Goede 2006-08-24 18:04:00 UTC
Hmm, going a bit offtopic for this bug, you could try:
yum update mkinitrd 'glibc*'
If that doesn't help please include the output of:
rpm -qa|grep glibc in your next comment


Comment 16 Ron Courtright 2006-08-24 23:09:43 UTC
Yes, updating glibc* and then mkinitrd did the trick.  Thanks to Mssrs. Degoede 
& Garden, and, for that matter all the other cognesceti who contributed to Bug 
30241 & Bug 18642 for providing the magic recipes and incantations to work 
through this problem.

Software engineering may not be the dismal science, but it sure travels some 
grim paths at times.

For the record, this, in brief, is my setup:
AMD 64 4200
RAID0 (2 x 250gb Western Digital ATA)
ASUS A8V

I know that this is not a production solution.  But if I wanted that I guess I 
would be using RHEL 4WS as I do at the office.  Again, thanks to all concerned 
for seeing me through the darkness.

Comment 17 Ron Courtright 2006-08-24 23:11:38 UTC
Sorry, I meant Bug 203241, you know, this one, and not 30241.  Stupid fingers.

Comment 18 Hans de Goede 2006-08-25 06:02:06 UTC
(In reply to comment #16)
> Yes, updating glibc* and then mkinitrd did the trick.

Did just the update do the trick, or did you also apply the second patch
attached to this bug?


Comment 19 Ron Courtright 2006-08-26 02:36:32 UTC
I first yum updated glibc* (to v2.4.90) and then yum updated mkinitrd (to
v5.1.9-1), both from the development repositories as you suggested above.

After those packages were installed, I re-enabled the standard depositories (and
disabled the development ones), applied the kernel update (to take my machine to
2.6.17-1.2174_FC5), and rebooted whilst holding my breath.  And so, here I am.

By the way, if memory serves, after the glibc update, the dependency list for
mkinitrd was that package only.  Also, all suggested updates, save the kernel,
were applied before I hybridized my system.  And as a further correction, my
RAID0 consists of SATA (not just ATA :-)) drives.  Again, thanks to all for
helping me through this.

Comment 20 Hans de Goede 2006-08-26 05:09:46 UTC
Hmm,

So you didn't use / apply any of the patches attached here and still have a
working setup that makes you the second person. Could you attach /include in a
comment your /etc/fstab and the output of the "mount" command? Just the lines
concernign your / (root) filesystem will do. Thanks!


Comment 21 Hans de Goede 2006-08-29 21:00:02 UTC
Looks like we are getting somewhere, thanks Jesse Keating
See the transcript from irc / #fedora-devel below:

f13 Horray!  dm-raid still bust-o on rawhide (:
f13 pjones: strangely enough, rescue mode is able to mount it just fine.
hansg f13, maybe the patch I submitted here will fix this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203241
* _Zoltan_ has quit (Read error: 113 (No route to host))
hansg f13, the second patch (workaround) can be applied directly to
/sbin/mkinitrd and then recreate the initrd
hansg f13, if you can try this, it helps you and you can then convince pjones to
take a look at #203241 you'll be my hero
* f13 looks
pjones I don't have any argument against fixing it.  I just don't have any time,
either.
hansg I've done a lot of digging and I'm willing todo more if needed, but some
response showing that my work (around 12 hours sofar) isn't going to /dev/null
would be appreciated
hansg f13, also make sure you are using the latest mkinitrd and that you do not
have any scsi_adapter usb-storage aliases in /etc/modprobe.conf
f13 hansg: so the problem I'm having is IO error reading sda2 or something like
that.
f13 I'll patch, we'll see.
* behdad has quit ("Leaving.")
hansg f13, then its most likely usb-storage aliases in /etc/modprobe.conf
* jwb grows tired of callion and dnielsen spotting on blogs
f13 hansg: I watched the mkinitrd creation, there were no usb modules added to
initrd.
f13 hansg: for rawhide do I need both the mkinitrd patch _and_ the initscripts
patch?
f13 n/m, I read it now
* f13 tries the /sbin/mkinitrd patch
hansg f13, you say no usb modules at all? Or just not usb-storage? If you've got
no usb-modules at all then you're using a pretty old mkinitrd (or a very new on
with wihhc I'm not familiar yet)
hansg f13, rpm -q mkinitrd ?
f13 hansg: oh fun!  I updated to newest mkinitrd and now I get the usb modules
brought in.
f13 uhci-hcd, ohci-hcd, ehci-hcd
hansg f13, good! That might fix the sda2 error
f13 hrm,
f13 I should try this w/out your patch first.
hansg those are a normal erm "feature" of the newest mkinitrd, as long as
usb-storage isn't added things are ok
f13 nod
hansg yes testing without the patch first is a good idea I think
f13 its helpful when you don't get udev but you have a usb keyboard
* somegeek has quit (Read error: 104 (Connection reset by peer))
f13 peter and I kept hitting this on my ppc mini.
f13 udev would barf the box, but w/out udev we couldn't use the usb keyboard (:
f13 hansg: rebooting w/out your patch.
hansg yes they are I had the same problem when I added a static shell to the
initrd to debug this on a friends Pc, no keyboard
f13 hansg: so, with the unpatched new mkinitd and a recreated initrd, it just works.
hansg thats good news, lots of people tell me that, but it doesn't work on my
friends PC without the patch :|
f13 suck.
hansg what does "mount" say for root?
hansg and /etc/fstab?
f13 /dev/dm-1 on /boot type ext3 (rw)
f13 /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
f13 /dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
f13 LABEL=/boot             /boot                   ext3    defaults        1 2
f13   --- Physical volume ---
f13   PV Name               /dev/dm-2
hansg Thanks, thats different from what my friend has he has /dev/dm-X as root
device instead of /dev/mapper/VolGroup00-LogVol00, thats probably why the
missing /dev/dm-x nodes (missing frm the /dev in the initrd) bite him
* tibbs_ has quit (Remote closed the connection)
f13 hansg: probably, is he not using LVM?
* stickster (n=pfrields@fedora/stickster) has joined #fedora-devel
hansg I'm pretty sure that if you change your fstab to contain LABEL=/ for root
and then rerun mkinitrd you will need my patch
hansg because when using a label the label gets translated to ./dev/dm-X and not
/dev/mapper/XXXXXX
f13 hrm.
f13 all that is crackrock.  I hear pjones screaming about that through the
cubewall on a weekly basis.
f13 the stupid naming of crud that is
pjones we shouldn't _ever_ be mounting a dm-N device
pjones If we do, that's a bug.
pjones (but ugh, what a PITA)
f13 pjones: my box has it mounted for /boot/   :/
pjones So I see.
f13 /dev/dm-1 on /boot type ext3 (rw)
f13 ah
* somegeek (i=levin@tor/regular/somegeek) has joined #fedora-devel
f13 so the label translation stuff is getting it wrong again?
hansg then my patch is wrong and the real bug is that LABEL= lines can get
translated to /dev/dm-X stuff?


---

Chris (chabotc) can you try to change the line for your root filesystem in
/etc/fstab to use /dev/mapper/XXXXXp3 as device instead of LABEL=/ and then
recreate your initrd with a pristine (unpatched) mkinitrd?


Comment 22 Hans de Goede 2006-08-29 21:20:44 UTC
some more irc logs:

hansg f13, pjones, If i understand correctly we've pretty much got the dmraid
problem confined / defined to wrong LABEL=xxx translation, right?
pjones I did say I haven't looked at it, right?
pjones but even if we get dm-1 instead of /dev/mapper/pdc_whatever , as long as
they're the same major:minor that shouldn't cause a failure
hansg pjones, it does because mkinitrd > 5.0.46 (nash > 5.0.46 actually) no
longer creates /dev/dm-x in the ramdisk /dev dir and does the  mkrootdev line
from the init ramdisk script fai  when it gets passed /dev/dm-x as a parameter
* Foolish has quit (Read error: 104 (Connection reset by peer))
hansg s/does/thus/
pjones yeah, but it shouldn't be getting /dev/dm-N as a parameter.
pjones if it is, there's another problem being missed
hansg pjones, agreed which seems to happen in the LABEL -> device translation
pjones taking patches ;)
hansg the stranege thing is I did try putting /dev/mapper/XXXX in the
initrd-init script manually and that didn't work either, but maybe that was with
an older mkinitrd when I was debugging this I've tried about 10 different
mkinitrd versions
hansg I've asked my friend to try it with /dev/mapper/XXXX as root in his fstab,
if that fixes things for him I'll take a stab at fixing the LABEL -> device creation


Comment 23 Hans de Goede 2006-08-31 21:04:46 UTC
It turns out that although /dev/dm-x related the patches attached to this bug
are completely wrong. The real problem (for normal setups) is that booting by
LABEL= from lvm or dmraid fails. Bug 204768 was created for this problem and
contains a proper patch, so I'm closing this one as a dup of 204768.


*** This bug has been marked as a duplicate of 204768 ***


Note You need to log in before you can comment on or make changes to this bug.