Bug 411691 - init[1] segfault on HP compaq nx9010 since kernel 2.6.23.1-49.fc8
init[1] segfault on HP compaq nx9010 since kernel 2.6.23.1-49.fc8
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: mkinitrd (Show other bugs)
9
i686 Linux
low Severity high
: ---
: ---
Assigned To: Peter Jones
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-12-05 04:34 EST by David Wood
Modified: 2009-05-06 09:02 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-06 09:02:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output from mkinitrd -v (9.45 KB, text/plain)
2008-02-26 09:40 EST, David Wood
no flags Details
Working initrd (as downloaded) (3.49 MB, application/octet-stream)
2008-02-28 05:09 EST, David Wood
no flags Details
Segfaulting initrd (as rebuilt using mkinitrd) (3.08 MB, application/octet-stream)
2008-02-28 05:10 EST, David Wood
no flags Details

  None (edit)
Description David Wood 2007-12-05 04:34:01 EST
Description of problem:
Updated kernel fails with the following message at boot:

init[1] segfault at 00000006 eip 00000006 esp bfbc70ec error 4

This message is repeated alternately with ...
printk:2900000 messages suppressed
.. where the actual number of messages varies slightly around this value

Version-Release number of selected component (if applicable):

2.6.23.1-49.fc8
2.6.23.8-63.fc8

Additional info:

2.6.23.1-42.fc8 works fine.
Comment 1 Alan Ryan 2007-12-08 11:40:46 EST
Hi, I'm not sure if this is where I should be adding this.  I am having the same
issue on a Dell vostro 200 with ICH9 SATA controller. The following is a copy of
a post I made describing the issue.

POST I:

I have vista (don't ask) installed on another partition, I used it's native tool
to shrink my 300G drive to 2x150 paritions. To test the M$ tool, I used gparted
bootcd to format the new partition /dev/sda4 to ext2. All seemed fine.

The install was hit & miss, with acpi=off and the SATA set to IDE mode in the
BIOS it booted and loaded FC8. Grub installed fine and I ran update to get new
kernel etc..

Most things seem to work fine, however using kfpgrabber and gftp I have a
reproducible error - both apps crash trying to FTP a ~13MB dir structure to my
local drive. I get errors from dmesg saying that exceptions occured in the eip &
eis registers.

POST II:

I had to switch the SATA options in the BIOS to RAID mode as opposed to IDE.
This allowed me to use the AHCI driver rather that the pix... (I think this is
then name of it) I have tried pasing irqpoll with acpi=off to the kernel at boot
time. This makes no difference, in fact it may be making things worse for me.
Many of my apps are segfault-ing, eclipse, gdm, cisco vpn client, httpd, cupsd,
procmail, gftp...etc..

This happens with or without kenerl params.

in general the dmesg looks like:
app[nnnn]: segfault at xxxxxxx eip xxxxxx esp xxxxxx error 6


I have seen something similar at
http://www.fedoraforum.org/forum/showthread.php?t=174130


I can't get output of lspci etc.. at the moment, the BCIM4318 is not supported
so I can't get at the box.  I can get this info if yoy need it.

Thanks, 
Alan

Comment 2 Alan Ryan 2007-12-11 13:16:34 EST
Hi, I spent a lot time on this and it looks like it may have been more my
application of linux than the kernel.  I could not update the BIOS, fw kept
failing on me.  so yyet another reinstall was undertaken. This time I left the
BIOS/SATA in IDE mode and passed ACPI=off irqpoll to the kernel.  It booted,
loaded the pix_... driver in no time and all went smoothly.  I am on the Fedora
werewolf distro, that is the 2.6.23.42 kernel as far as I know.  I haven't dared
upgrade.  Anyhow, no seg faults and the box is powering along.  (The irqpoll
param is not present when I boot now, maybe it's hardcoded into the kernel at
install)

Alan
Comment 3 Alan Ryan 2007-12-12 14:06:31 EST
ref: irqpoll - I do have to pass this to the kernel - although I'm sure it
booted without it once? Puzzled
Comment 4 David Wood 2007-12-13 03:58:47 EST
I've tried acpi=off and irqpoll, but neither fix the problem on this laptop.
Comment 5 David Wood 2007-12-21 04:31:32 EST
Just updated to kernel 2.6.23.9-85.fc8 and still no luck.  Only versions 
2.6.23.1-42.fc8 and earlier are bootable.  Any suggestions for providing better
diagnostic information gratefully received.  Unfortunately, all of my boot.log
files are empty.
Comment 6 Chuck Ebbert 2007-12-21 16:31:37 EST
Try removing "quiet" and "rhgb" from the kernel options to see exactly where it
fails in bootup. (And add "debug".)
Comment 7 David Wood 2008-01-03 06:49:05 EST
OK. I've tried this and the first init[1] segfault messages occurs immediately
after the "Write protecting the kernel read-only data: 844k" message.  This is
followed by a couple more routine messages, one from 'input' where the mouse is
detected and a couple from atkbd.c where it asks me (twice) to use 'setkeycodes'
to make 'e06e' known.  It then just endlessly repeats the "init[1] segfault ..."
and the "printk:2900000 messages suppressed" messages.
Comment 8 David Wood 2008-01-25 12:06:54 EST
I'm still getting the same problem with kernel-2.6.23.14-107.fc8 with little
idea how to debug the problem further.  I see others have the same problem ...

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg210980.html

Is there any chance that the regression mentioned in the message below could be
 affecting Fedora kernels beyond -42.fc8 ?

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg202676.html
Comment 9 David Wood 2008-02-19 07:30:48 EST
Apologies for 'bumping' this bug report but the latest rawhide kernel offers
more info on this ongoing segfault problem:

init[1]: segfault at 6 ip 00000006 sp bf8cf2d8 error 4 in ld-2.7.90.so[110000+1f000]

The kernel version is 2.6.25-0.40.rc1.git2.fc9.i686
Comment 10 Chuck Ebbert 2008-02-25 17:43:40 EST
Oh no, looks like maybe bug 336161 has returned.

Can you boot in rescue mode and rebuild the initrd?

# chroot /mnt/sysimage
# mkinitrd /boot/initrd-<kernelversion>.img <kernelversionversion>

See https://fedoraproject.org/wiki/KernelCommonProblems for more detailed
directions.
Comment 11 David Wood 2008-02-26 09:40:33 EST
Created attachment 295919 [details]
Output from mkinitrd -v

As I can still boot into kernel 2.6.23.1-42.fc8 I presumed booting into rescue
mode wasn't necessary for this.

I've tried mkinitrd on both the latest kernel (2.6.25-0.65.rc2.git7.fc9) and
working kernel (2.6.23.1-42.fc8) versions.  Interestingly I get the same
segfault error from BOTH rebuilt initrd images.  Attached is a log from the
rebuild process.
Comment 12 Chuck Ebbert 2008-02-27 18:55:17 EST
Can you attach a working and a broken initrd?
Comment 13 David Wood 2008-02-28 05:09:14 EST
Created attachment 296178 [details]
Working initrd (as downloaded)
Comment 14 David Wood 2008-02-28 05:10:25 EST
Created attachment 296179 [details]
Segfaulting initrd (as rebuilt using mkinitrd)
Comment 15 Chuck Ebbert 2008-02-28 18:55:37 EST
Are you using the Fedora 8 mkinitrd or the rawhide one? And you appear to be
getting the rawhide glibc shared libraries in your Fedora 8 initrd...
Comment 16 David Wood 2008-02-29 07:20:14 EST
I upgraded the laptop to rawhide once the F8 kernel updates started segfaulting
(in the hopes that it would be fixed in rawhide sooner).  So, yes, I have been
using the rawhide mkinitrd which I expect has been picking up the rawhide glibc
shared libraries.  I've got a few up-to-date F8 systems on which I've tried
creating new initrd images, but I then get into trouble with them not finding
the path to the root filesystem (which is starting to get beyond my Linux
knowledge).  For further info, my disk layout is:

/dev/sda2 is /boot with label "/boot"
/dev/sda3 is VolGroup00 with LogVol00 as / and LogVol01 as swap, no labels
Comment 17 Nate 2008-03-10 13:31:07 EDT
I had this exact problem. Here is how i fixed it
1. boot into a rescue cd
2. chroot into root filesystem (chroot /mnt/sysimage)
3. remove old /lib/ld-linux.so.n (was not associated with any rpm)
4. rebuild initrd
     /sbin/mkinitrd -f -v /boot/initrd-2.6.23.1-42.fc8.img 2.6.23.1-42.fc8
5. reboot

so basically mkinitrd was picking up an old lib that was causing nash to segfault.
Comment 18 David Wood 2008-03-14 10:46:00 EDT
I don't think this is my problem.  I have /lib/ld-linux.so.2 linked to
/lib/ld-2.7.90.so and /lib/ld-lsb.so.3 linked to /lib/ld-linux.so.2
If I delete /lib/ld-linux.so.2 then the system won't even boot my working kernel!

However, I've just tried the latest kernel (2.6.25-0.113.rc5.git2.fc9) and
finally it doesn't segfault :)  But instead I get a kernel panic at exactly the
same point :(

/bin/nash: /lib/libc.so.6: version 'GLIBC_2.8' not found (required by
/lib/libglib-2.0.so.0)

As I presume this kernel works on other systems, does this explain the earlier
segfaults?  If so, what is the fix? (I've tried mkinitrd on this latest initrd
and I get exactly the same error).
Comment 19 denis ivanov 2008-03-23 13:24:58 EDT
/lib/libc.so.6: version 'GLIBC_2.8' not found

Waste hours of brainfuckin to find reason of this problem.

Most of current rawhide libs (glib, libselinux, libpam etc) requires GLIBC_2.8
(glibc-2.7.90-9.i686 idenifies as 2.8).

My system was "rawhided" few years ago so some things is really old.

The problem was in /lib:

/lib/libc.so.6 was linked to /lib/libc.so.0 (some old file which not owned by
any of installed packages)

There also libc-2.7.90.so (actual glibc package).

Just re-link:

cd /lib && ln -sf libc-2.7.90.so libc.so.6

Now all ok !
Comment 20 David Wood 2008-03-25 07:41:16 EDT
I'm afraid this link already exists.  I've done ldconfig (to make sure the cache
is up-to-date) and then ldconfig -p | grep libc.so and I get

libc.so.6 (libc6, OS ABI: Linux 2.6.9) => /lib/libc.so.6

Should I expect the Linux version to be greater than 2.6.9 ?
Maybe it is still a missing link, but I've got no other libc.so link in /lib.  
Comment 21 denis ivanov 2008-03-25 14:59:14 EDT
$ ls -l /lib/libc.so.6
lrwxrwxrwx 1 root root 9 Mar 23 23:14 /lib/libc.so.6 -> 
libc-2.7.90.so # ---> this is ok

$ ldconfig -p|grep libc.so.6
        libc.so.6 (libc6, OS ABI: Linux 2.6.9) => /lib/libc.so.6

The problem is before /lib/libc.so.6 was linked to libc.so.0 (too old version of
libc not owned by any of current packages)
Comment 22 David Wood 2008-03-31 06:35:18 EDT
FIX FOUND!!!
Segfaults seemed to have been due to picking up an old libc.so.6 as Chuck
guessed in comment #10.  The "version 'GLIBC_2.8' not found" error message
seemed to confirm this.
I finally discovered that mkinitrd was picking up the 2.7 libraries from a
directory called /lib/i686/nosegneg
I moved this directory out of the way and rebuilt the initrd and hey presto, no
boot problems.  I guess either the installer needs to check there aren't
obsolete libraries in this directory or mkinitrd shouldn't look in this
directory when building the initrd.
Comment 23 Bug Zapper 2008-05-14 00:05:29 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 24 Jeremy Katz 2009-05-05 17:42:29 EDT
Is anyone still seeing this with F10/rawhide?
Comment 25 David Wood 2009-05-06 04:26:44 EDT
Problem not seen here since I applied the fix in comment #22.

Note You need to log in before you can comment on or make changes to this bug.