Bug 411691 - init[1] segfault on HP compaq nx9010 since kernel 2.6.23.1-49.fc8
Summary: init[1] segfault on HP compaq nx9010 since kernel 2.6.23.1-49.fc8
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: mkinitrd
Version: 9
Hardware: i686
OS: Linux
low
high
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-05 09:34 UTC by David Wood
Modified: 2009-05-06 13:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-06 13:02:55 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Output from mkinitrd -v (9.45 KB, text/plain)
2008-02-26 14:40 UTC, David Wood
no flags Details
Working initrd (as downloaded) (3.49 MB, application/octet-stream)
2008-02-28 10:09 UTC, David Wood
no flags Details
Segfaulting initrd (as rebuilt using mkinitrd) (3.08 MB, application/octet-stream)
2008-02-28 10:10 UTC, David Wood
no flags Details

Description David Wood 2007-12-05 09:34:01 UTC
Description of problem:
Updated kernel fails with the following message at boot:

init[1] segfault at 00000006 eip 00000006 esp bfbc70ec error 4

This message is repeated alternately with ...
printk:2900000 messages suppressed
.. where the actual number of messages varies slightly around this value

Version-Release number of selected component (if applicable):

2.6.23.1-49.fc8
2.6.23.8-63.fc8

Additional info:

2.6.23.1-42.fc8 works fine.

Comment 1 Alan Ryan 2007-12-08 16:40:46 UTC
Hi, I'm not sure if this is where I should be adding this.  I am having the same
issue on a Dell vostro 200 with ICH9 SATA controller. The following is a copy of
a post I made describing the issue.

POST I:

I have vista (don't ask) installed on another partition, I used it's native tool
to shrink my 300G drive to 2x150 paritions. To test the M$ tool, I used gparted
bootcd to format the new partition /dev/sda4 to ext2. All seemed fine.

The install was hit & miss, with acpi=off and the SATA set to IDE mode in the
BIOS it booted and loaded FC8. Grub installed fine and I ran update to get new
kernel etc..

Most things seem to work fine, however using kfpgrabber and gftp I have a
reproducible error - both apps crash trying to FTP a ~13MB dir structure to my
local drive. I get errors from dmesg saying that exceptions occured in the eip &
eis registers.

POST II:

I had to switch the SATA options in the BIOS to RAID mode as opposed to IDE.
This allowed me to use the AHCI driver rather that the pix... (I think this is
then name of it) I have tried pasing irqpoll with acpi=off to the kernel at boot
time. This makes no difference, in fact it may be making things worse for me.
Many of my apps are segfault-ing, eclipse, gdm, cisco vpn client, httpd, cupsd,
procmail, gftp...etc..

This happens with or without kenerl params.

in general the dmesg looks like:
app[nnnn]: segfault at xxxxxxx eip xxxxxx esp xxxxxx error 6


I have seen something similar at
http://www.fedoraforum.org/forum/showthread.php?t=174130


I can't get output of lspci etc.. at the moment, the BCIM4318 is not supported
so I can't get at the box.  I can get this info if yoy need it.

Thanks, 
Alan



Comment 2 Alan Ryan 2007-12-11 18:16:34 UTC
Hi, I spent a lot time on this and it looks like it may have been more my
application of linux than the kernel.  I could not update the BIOS, fw kept
failing on me.  so yyet another reinstall was undertaken. This time I left the
BIOS/SATA in IDE mode and passed ACPI=off irqpoll to the kernel.  It booted,
loaded the pix_... driver in no time and all went smoothly.  I am on the Fedora
werewolf distro, that is the 2.6.23.42 kernel as far as I know.  I haven't dared
upgrade.  Anyhow, no seg faults and the box is powering along.  (The irqpoll
param is not present when I boot now, maybe it's hardcoded into the kernel at
install)

Alan

Comment 3 Alan Ryan 2007-12-12 19:06:31 UTC
ref: irqpoll - I do have to pass this to the kernel - although I'm sure it
booted without it once? Puzzled

Comment 4 David Wood 2007-12-13 08:58:47 UTC
I've tried acpi=off and irqpoll, but neither fix the problem on this laptop.

Comment 5 David Wood 2007-12-21 09:31:32 UTC
Just updated to kernel 2.6.23.9-85.fc8 and still no luck.  Only versions 
2.6.23.1-42.fc8 and earlier are bootable.  Any suggestions for providing better
diagnostic information gratefully received.  Unfortunately, all of my boot.log
files are empty.

Comment 6 Chuck Ebbert 2007-12-21 21:31:37 UTC
Try removing "quiet" and "rhgb" from the kernel options to see exactly where it
fails in bootup. (And add "debug".)


Comment 7 David Wood 2008-01-03 11:49:05 UTC
OK. I've tried this and the first init[1] segfault messages occurs immediately
after the "Write protecting the kernel read-only data: 844k" message.  This is
followed by a couple more routine messages, one from 'input' where the mouse is
detected and a couple from atkbd.c where it asks me (twice) to use 'setkeycodes'
to make 'e06e' known.  It then just endlessly repeats the "init[1] segfault ..."
and the "printk:2900000 messages suppressed" messages.

Comment 8 David Wood 2008-01-25 17:06:54 UTC
I'm still getting the same problem with kernel-2.6.23.14-107.fc8 with little
idea how to debug the problem further.  I see others have the same problem ...

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg210980.html

Is there any chance that the regression mentioned in the message below could be
 affecting Fedora kernels beyond -42.fc8 ?

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg202676.html

Comment 9 David Wood 2008-02-19 12:30:48 UTC
Apologies for 'bumping' this bug report but the latest rawhide kernel offers
more info on this ongoing segfault problem:

init[1]: segfault at 6 ip 00000006 sp bf8cf2d8 error 4 in ld-2.7.90.so[110000+1f000]

The kernel version is 2.6.25-0.40.rc1.git2.fc9.i686

Comment 10 Chuck Ebbert 2008-02-25 22:43:40 UTC
Oh no, looks like maybe bug 336161 has returned.

Can you boot in rescue mode and rebuild the initrd?

# chroot /mnt/sysimage
# mkinitrd /boot/initrd-<kernelversion>.img <kernelversionversion>

See https://fedoraproject.org/wiki/KernelCommonProblems for more detailed
directions.


Comment 11 David Wood 2008-02-26 14:40:33 UTC
Created attachment 295919 [details]
Output from mkinitrd -v

As I can still boot into kernel 2.6.23.1-42.fc8 I presumed booting into rescue
mode wasn't necessary for this.

I've tried mkinitrd on both the latest kernel (2.6.25-0.65.rc2.git7.fc9) and
working kernel (2.6.23.1-42.fc8) versions.  Interestingly I get the same
segfault error from BOTH rebuilt initrd images.  Attached is a log from the
rebuild process.

Comment 12 Chuck Ebbert 2008-02-27 23:55:17 UTC
Can you attach a working and a broken initrd?

Comment 13 David Wood 2008-02-28 10:09:14 UTC
Created attachment 296178 [details]
Working initrd (as downloaded)

Comment 14 David Wood 2008-02-28 10:10:25 UTC
Created attachment 296179 [details]
Segfaulting initrd (as rebuilt using mkinitrd)

Comment 15 Chuck Ebbert 2008-02-28 23:55:37 UTC
Are you using the Fedora 8 mkinitrd or the rawhide one? And you appear to be
getting the rawhide glibc shared libraries in your Fedora 8 initrd...

Comment 16 David Wood 2008-02-29 12:20:14 UTC
I upgraded the laptop to rawhide once the F8 kernel updates started segfaulting
(in the hopes that it would be fixed in rawhide sooner).  So, yes, I have been
using the rawhide mkinitrd which I expect has been picking up the rawhide glibc
shared libraries.  I've got a few up-to-date F8 systems on which I've tried
creating new initrd images, but I then get into trouble with them not finding
the path to the root filesystem (which is starting to get beyond my Linux
knowledge).  For further info, my disk layout is:

/dev/sda2 is /boot with label "/boot"
/dev/sda3 is VolGroup00 with LogVol00 as / and LogVol01 as swap, no labels

Comment 17 Nate 2008-03-10 17:31:07 UTC
I had this exact problem. Here is how i fixed it
1. boot into a rescue cd
2. chroot into root filesystem (chroot /mnt/sysimage)
3. remove old /lib/ld-linux.so.n (was not associated with any rpm)
4. rebuild initrd
     /sbin/mkinitrd -f -v /boot/initrd-2.6.23.1-42.fc8.img 2.6.23.1-42.fc8
5. reboot

so basically mkinitrd was picking up an old lib that was causing nash to segfault.


Comment 18 David Wood 2008-03-14 14:46:00 UTC
I don't think this is my problem.  I have /lib/ld-linux.so.2 linked to
/lib/ld-2.7.90.so and /lib/ld-lsb.so.3 linked to /lib/ld-linux.so.2
If I delete /lib/ld-linux.so.2 then the system won't even boot my working kernel!

However, I've just tried the latest kernel (2.6.25-0.113.rc5.git2.fc9) and
finally it doesn't segfault :)  But instead I get a kernel panic at exactly the
same point :(

/bin/nash: /lib/libc.so.6: version 'GLIBC_2.8' not found (required by
/lib/libglib-2.0.so.0)

As I presume this kernel works on other systems, does this explain the earlier
segfaults?  If so, what is the fix? (I've tried mkinitrd on this latest initrd
and I get exactly the same error).

Comment 19 denis ivanov 2008-03-23 17:24:58 UTC
/lib/libc.so.6: version 'GLIBC_2.8' not found

Waste hours of brainfuckin to find reason of this problem.

Most of current rawhide libs (glib, libselinux, libpam etc) requires GLIBC_2.8
(glibc-2.7.90-9.i686 idenifies as 2.8).

My system was "rawhided" few years ago so some things is really old.

The problem was in /lib:

/lib/libc.so.6 was linked to /lib/libc.so.0 (some old file which not owned by
any of installed packages)

There also libc-2.7.90.so (actual glibc package).

Just re-link:

cd /lib && ln -sf libc-2.7.90.so libc.so.6

Now all ok !


Comment 20 David Wood 2008-03-25 11:41:16 UTC
I'm afraid this link already exists.  I've done ldconfig (to make sure the cache
is up-to-date) and then ldconfig -p | grep libc.so and I get

libc.so.6 (libc6, OS ABI: Linux 2.6.9) => /lib/libc.so.6

Should I expect the Linux version to be greater than 2.6.9 ?
Maybe it is still a missing link, but I've got no other libc.so link in /lib.  


Comment 21 denis ivanov 2008-03-25 18:59:14 UTC
$ ls -l /lib/libc.so.6
lrwxrwxrwx 1 root root 9 Mar 23 23:14 /lib/libc.so.6 -> 
libc-2.7.90.so # ---> this is ok

$ ldconfig -p|grep libc.so.6
        libc.so.6 (libc6, OS ABI: Linux 2.6.9) => /lib/libc.so.6

The problem is before /lib/libc.so.6 was linked to libc.so.0 (too old version of
libc not owned by any of current packages)


Comment 22 David Wood 2008-03-31 10:35:18 UTC
FIX FOUND!!!
Segfaults seemed to have been due to picking up an old libc.so.6 as Chuck
guessed in comment #10.  The "version 'GLIBC_2.8' not found" error message
seemed to confirm this.
I finally discovered that mkinitrd was picking up the 2.7 libraries from a
directory called /lib/i686/nosegneg
I moved this directory out of the way and rebuilt the initrd and hey presto, no
boot problems.  I guess either the installer needs to check there aren't
obsolete libraries in this directory or mkinitrd shouldn't look in this
directory when building the initrd.

Comment 23 Bug Zapper 2008-05-14 04:05:29 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 24 Jeremy Katz 2009-05-05 21:42:29 UTC
Is anyone still seeing this with F10/rawhide?

Comment 25 David Wood 2009-05-06 08:26:44 UTC
Problem not seen here since I applied the fix in comment #22.


Note You need to log in before you can comment on or make changes to this bug.