Bug 129673 - mkinitrd appears to over-rely on some Fedora kernels features
Summary: mkinitrd appears to over-rely on some Fedora kernels features
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: mkinitrd
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeremy Katz
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: FC3Target FC3BugWeekQA
TreeView+ depends on / blocked
 
Reported: 2004-08-11 19:01 UTC by Michal Jaegermann
Modified: 2007-11-30 22:10 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-09-28 04:00:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Patch that fixes many bugs (11.78 KB, patch)
2004-08-14 17:33 UTC, Steve Grubb
no flags Details | Diff
Patch that fixes many bugs (11.94 KB, patch)
2004-08-14 18:00 UTC, Steve Grubb
no flags Details | Diff
missing declarations in nash.c patch (704 bytes, patch)
2004-08-14 21:34 UTC, Michal Jaegermann
no flags Details | Diff
Final Patch (1.94 KB, patch)
2004-08-19 18:07 UTC, Steve Grubb
no flags Details | Diff

Description Michal Jaegermann 2004-08-11 19:01:59 UTC
Description of problem:

On x86 box apart of "regular" Fedora test kernels I have also
some custom kernels for other development work.  With
With initrd images made with mkinitrd-4.0.3-1 I can boot
both, say, 2.6.7-1.515 and my custom kernel if a command line
is like this (for example):

ro root=LABEL=/12 selinux=0 nousb

The pictures changes considerably if I will append to the
string above " 1" or " 3".  Then 2.6.7-1.515 still boots
and goes into a desired runlevel but with my custom kernel I see:

.....
Switching to new root
exec of init failed!!! 14
Kernel panic: Attempted to kill init!

This does not change if with a custom kernel I will use a command
line like "ro root=LABEL=/12 3" or "ro root=LABEL=/12 1".  Only
leaving a runlevel specification out allows me to boot.

The situation was the same with the previous version of mkinitrd
(and I do not know about earlier ones after a change in a type
of produced images).

OTOH I may be just lucky with Fedora kernels as some reports
on fedora-test-list seem to suggest that with somewhat different
hardware other people have similar troubles without any custom
kernels in play and nothing added to command lines.  Details in
other cases are not that clear to me.

Version-Release number of selected component (if applicable):
mkinitrd-4.0.3-1

How reproducible:
Always with my custom kernel.

Comment 1 Jeremy Katz 2004-08-11 19:57:45 UTC
Could you try again with mkinitrd 4.0.4 (will be at
http://people.redhat.com/~katzj/mkinitrd/ as soon as it's done building)?

Comment 2 Michal Jaegermann 2004-08-11 23:32:03 UTC
I did; and not with very happy results. :-)

Regardless of which kernel and which command line options I am
using if initrd was done with that version of mkinitrd then I
am invariably seeing this:
....
Mounting root filesystem.
mount: error 6 mouting ext3
Switching to new root
switchroot: mount failed: 22
Kernel panic: Attempted to kill init!

References to Catch-22?  Oops!

Comment 3 Kaj J. Niemi 2004-08-12 13:20:42 UTC
The identical error actually happened to me and I'm using Fedora kernels.

grub.conf looks as follows:

default=0
timeout=10
title Red Hat Linux (2.6.7-1.517smp)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.7-1.517smp ro root=LABEL=/ acpi=on
elevator=deadline
        initrd /boot/initrd-2.6.7-1.517smp.img



Comment 4 Kaj J. Niemi 2004-08-12 13:35:48 UTC
mkinitrd got upgraded to 4.0.4 before I installed 2.6.7-517smp.

Comment 5 Jeremy Katz 2004-08-12 17:54:44 UTC
I blame lack of sleep or some such,
http://people.redhat.com/~katzj/mkinitrd/ has mkinitrd-4.0.5 now which
really should be better

Comment 6 Michal Jaegermann 2004-08-12 18:45:06 UTC
Indeed mkinitrd-4.0.5 produces images which allow me to boot
2.6.7-517 and my custom kernel too as opposed to 4.0.4 where
everything was blowing up.  OTOH it is not that different from
mkinitrd-4.0.3 in that that adding a runlevel specification for my
custom kernel ends up with "exec of init failed!!! 14" although
doing that with 2.6.7-517 or 2.6.7-517smp does not have any
ill-effects.

Nasty nash. :-)

Comment 7 Jeremy Katz 2004-08-12 22:51:40 UTC
Hrmm... do you have an init=?  What does your grub.conf contain? 
Basically that's saying that the exec of init failed with -EFAULT
which seems strange to say the least.  

Also, you can grab http://people.redhat.com/~katzj/nash-test, cp
nash-test /sbin/nash and then remake your initrd to get a little bit
more information on what it's exec'ing as init.

Comment 8 Michal Jaegermann 2004-08-12 23:49:56 UTC
> Hrmm... do you have an init=?
In a command line?  No.  Options are listed in my original reports.
They are:

ro root=LABEL=/12 selinux=0 nousb

Do you suggest that I have init if I using that but it magically
disappears if I will add "3" to the end?  Maybe this is the case
but that is weird.

I will see what I can get with 'nash-test' later.

Comment 9 Michal Jaegermann 2004-08-13 03:39:01 UTC
I replaced my nash with nash-test and put "id:3:initdefault" in
/etc/inittab. After remaking initrd this is what it is on it
beyond /proc and /dev

drwxr-xr-x   2 root     root            0 Aug 12 20:40 sysroot
drwxr-xr-x   2 root     root            0 Aug 12 20:40 sys
drwxr-xr-x   2 root     root            0 Aug 12 20:40 loopfs
lrwxrwxrwx   1 root     root            3 Aug 12 20:40 sbin -> bin
drwxr-xr-x   2 root     root            0 Aug 12 20:40 lib
-rw-r--r--   1 root     root        86700 Aug 12 20:40 lib/jbd.ko
-rw-r--r--   1 root     root       128732 Aug 12 20:40 lib/ext3.ko
drwxr-xr-x   2 root     root            0 Aug 12 20:40 bin
lrwxrwxrwx   1 root     root           10 Aug 12 20:40 bin/modprobe ->
/sbin/nash
-rwxr-xr-x   1 root     root        66158 Aug 12 20:40 bin/nash
-rwxr-xr-x   1 root     root       152408 Aug 12 20:40 bin/insmod
-rwxr-xr-x   1 root     root          451 Aug 12 20:40 init
drwxr-xr-x   2 root     root            0 Aug 12 20:40 etc

so it hardly can be simpler.

Just re-made initrd does not bring much information so I edited 'init'
script and replaced 'setquiet' with 'showlabels'.  With that if
I am booting with "ro root=LABEL=/12 selinux=0 nousb" on in kernel
options then I see this:

Red Hat nash version 4.0.5 starting
/dev/hda1 / e1ab2ed6-5e19-11d6-908d-b85d44f2b93d
/dev/hda5 /usr e71d092-5e19-11d6-80c1-c792d8497d9c
/dev/hda7 /home e258cc44-5e19-11d6-9b84-aa6a385944ae
/dev/hda8 spare12 32ade583-34ea-4e6d-8de6-20d2918b962a
/dev/hdb1 /boot1 eb4ba56e-2874-48fa-839c-16fbe8c4abae
/dev/hdb5 /1 f9ad2458-8dc2-476-8687-dd4a5c1812b5
/dev/hdb6 /usr1 0ac3cb0-1325-4981-b8aa-f6fa381ab8c
/dev/hdb7 /var1 5b3a6d74-db61-4e7-a595-91d5de92b38
/dev/hdb8 /home1 4f475734-474b-4195-a6fa-87f12e681af0
/dev/hdb9 /12 48b37e2c-7284-4643-aa9e-1645b2926e0
/dev/hdb10 /usr12 aabb63c0-2d93-44bb-a2e-fd7466e265f
Mounted /proc filesystem
Mounting sysfs
Loading jbd.ko module
Loading ext3.ko module
Creating block devices
Creating root device
Mounting root filesystem
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Switching to new root
INIT: version 2.85 booting

and the whole startup proceeds normally.

If I will use "ro root=LABEL=/12 selinux=0 nousb 3" then everything
looks the same up to "Switching ..." line.  Then I get:
....
Switching to new root
exec of init (/sbin/init) failed!!!: 14
Kernel panic: Attempted to kill init!

Not that much I can see from nash, I am afraid.


Comment 10 Michal Jaegermann 2004-08-13 03:48:12 UTC
BTW - using instead "ro root=LABEL=/12 selinux=0 nousb 3"
an explicit "ro root=/dev/hdb9 selinux=0 nousb 3" does not help.
I just tried and ended up with the same "... failed!!!: 14".

Comment 11 Jeremy Katz 2004-08-13 17:18:18 UTC
Okay added even more debugging printfs to nash's exec of init.  New
nash at http://people.redhat.com/~katzj/nash-test-2.  If you could
grab that and remake your initrd and let me know what output you get,
that would be helpful.

Comment 12 Michal Jaegermann 2004-08-13 19:19:24 UTC
It prints now, apart from what was printing before,
.....
Switching to new root
initargs[1]: ro
initargs[2]: root=LABEL=/12
initargs[3]: selinux=0
initargs[4]: nousb
INIT: version 2.85 booting

in a "good" case and
.....
Switching to new root
initargs[1]: ro
initargs[2]: root=LABEL=/12
initargs[3]: selinux=0
initargs[4]: nousb
initargs[5]: 3
exec of init (/sbin/init) failed!!!: 14
Kernel panic: Attempted to kill init!

when this "3" is added.  Otherwise not much changed.

It would be likely good to check if it is not lying about
mounting the real root and/or of a presence /sbin/init but
I am not sure how to do that from nash.  Dump /proc/mounts?


Comment 13 Steve Grubb 2004-08-14 17:33:57 UTC
Created attachment 102729 [details]
Patch that fixes many bugs

I gave mkinitrd a code review and found all kinds of bugs. There were
uninitialized variables getting used, memory leaks, negative array indexing,
important code for device numbers effectively commented out, and execv was
being called with stack variables. Please apply this patch. It does change some
of the error reported, hopefully for the better.

Comment 14 Steve Grubb 2004-08-14 18:00:35 UTC
Created attachment 102734 [details]
Patch that fixes many bugs

I sent the second to last patch last time...sorry.

Comment 15 W. Michael Petullo 2004-08-14 18:11:22 UTC
The patch contained in comment #14 also seems to fix bug #129836 for me.

Comment 16 Michal Jaegermann 2004-08-14 21:34:23 UTC
Created attachment 102738 [details]
missing declarations in nash.c patch

Another small patch to clean up missing declarations in nash.c on the top
of the previous one.  Substituting on my initrd 'nash' recompiled with these
indeed clears the issue for me.

After those patches  remaining warnings are about an implicit dropping of
'const' qualifier from some pointers.

Comment 17 Jeremy Katz 2004-08-16 17:16:37 UTC
Applied most of the patch here.  Some of it isn't quite right and the
use of the numeric errnos instead of strerror is intentional (avoids
bringing in the strings).

All in mkinitrd-4.0.6 -- thanks for the patches.

Comment 18 Michal Jaegermann 2004-08-16 23:08:52 UTC
So far mkinitrd-4.0.6 is in "worksforme" category.  I could not
reproduce with it troubles I had with other versions.

Out of curiosity I looked how much strings bring in from dietlibc
and this does not look like a lot.  Many modules one may need
will be much bigger than that. Although I agree that less bytes
on mkinitrd the better.

Comment 19 Steve Grubb 2004-08-19 18:07:18 UTC
Created attachment 102889 [details]
Final Patch

I reviewed the latest changes. Thanks for applying the bulk of them. The
original bug is fixed as far as I can tell.

In the interest of having clean code, I have one last set of patches that can
be applied against mkinitrd-4.06. They were in the original patch I sent. 

In grubby.c @ 1536, there really is a potential memory leak. The call to free
fixes it.

In nash.c @ 59, I moved the allocation of memory in order to avoid free'ing the
memory if the open failed.

@ 216, rc needs to be initialized. Otherwise it can return whatever the random
value the stack has. If this chunk is applied, the other place where rc is set
to 1 can be deleted.

@ 624, a const was added so that gcc knows not to create the array of strings
on the stack and to move it to the .rodata segment.

@ 679, initargs was previously malloc'ed. Its first element may need to be set
to NULL so exec doesn't derefence a bogus pointer.

@ 767, devNum really is an int. If it were unsigned, the test for (devNum < 0)
will never be true.

I'm continuing to apply this smaller patch against my tree. I don't think there
is anything here that is super critical. But it would be nice if we were in
sync so I can drop the patch.

Comment 20 Jeremy Katz 2004-08-24 19:14:57 UTC
Applying with the following caveat:
* nash.c:216 - rc should be initialized to 0, not 1.  Otherwise, we'd
just always end up returning 1 (which isn't the intent, 1 is an error)

Thanks again for the patches

Comment 21 Jef Spaleta 2004-09-28 04:00:21 UTC
closing this out as resolved rawhide, since discussion in the report
has died off and the initial problem is confirmed by the orignal
reporter as being resolved in the comments.

-jef


Note You need to log in before you can comment on or make changes to this bug.