Bug 336161 - init segfault at firstboot
init segfault at firstboot
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: mkinitrd (Show other bugs)
8
All Linux
low Severity low
: ---
: ---
Assigned To: Peter Jones
Fedora Extras Quality Assurance
:
: 333521 343951 (view as bug list)
Depends On:
Blocks: F8Blocker
  Show dependency treegraph
 
Reported: 2007-10-17 09:46 EDT by Jean-Philippe Dionne
Modified: 2007-11-30 17:12 EST (History)
5 users (show)

See Also:
Fixed In Version: 6.0.19-3.fc8
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-20 00:36:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
upgrade.log of fc6 to f8test3 (42.56 KB, text/plain)
2007-10-19 08:57 EDT, Jean-Philippe Dionne
no flags Details
upgrade.log.syslog of fc6 to f8test3 (1.37 KB, application/octet-stream)
2007-10-19 08:58 EDT, Jean-Philippe Dionne
no flags Details
patch for this issue (477 bytes, patch)
2007-10-20 00:08 EDT, Bill Nottingham
no flags Details | Diff
updated patch (738 bytes, patch)
2007-10-20 00:27 EDT, Bill Nottingham
no flags Details | Diff

  None (edit)
Description Jean-Philippe Dionne 2007-10-17 09:46:59 EDT
Description of problem:

After an upgrade installation from Fedora Core 6 to Fedora 8 test 3 (x86_64), at
the first boot, an infinite loop with a segfault from init is printed:

init[1]: segfault at 0000000000000009 rip 0000003c3540cebe rsp 00007fff516bef90
error 4

The installation from scratch works fine. 

Hardware platform: Sun Ultra 20, Dual Core AMD Opteron(tm) Processor 180

How reproducible: always

Steps to Reproduce:
1. Upgrade from Fedora core 6 to Fedora 8
2. First boot after the installation
Comment 1 Bill Nottingham 2007-10-18 00:22:33 EDT
Nice. When during the boot process does this happen?
Comment 2 Jean-Philippe Dionne 2007-10-18 09:39:16 EDT
I can't say exactly when because the screen is flooded by printk messages about
the init segfault. What I can say is that it happens quickly after the kernel
initialization and before the /etc/rc scripts.  

The problem is the same with the 'single' kernel option on.

Comment 3 Bill Nottingham 2007-10-18 10:41:34 EDT
Does:

a) booting with init=/bin/bash work?
b) booting with selinux=0 work?
Comment 4 Jean-Philippe Dionne 2007-10-19 08:41:39 EDT
I have tested by starting from a new installation of fc6 (default packages from
the fc6 release, no yum update) then upgraded to f8test3 and the problem is the
same.  

The last line before the init segfault is:
"Freeing unused kernel memory ..."
"Write protecting the kernel read only data: 1036k"

a) booting with init=/bin/bash work? No
b) booting with selinux=0 work? No

One more problem that might be related is that the fc6 kernel is the default
choice in grub (which does not boot of course).  That grub entry should be
removed...
Comment 5 Jean-Philippe Dionne 2007-10-19 08:57:07 EDT
Created attachment 232491 [details]
upgrade.log of fc6 to f8test3
Comment 6 Jean-Philippe Dionne 2007-10-19 08:58:06 EDT
Created attachment 232501 [details]
upgrade.log.syslog of fc6 to f8test3
Comment 7 Bill Nottingham 2007-10-19 12:36:31 EDT
How does booting with init=/bin/bash fail - what sort of errors?
Comment 8 Jean-Philippe Dionne 2007-10-19 13:08:26 EDT
Exactly the same error as shown in my first comment.  In fact, init=/anything
shows the same error.

To be able to access upgrade.log and upgrade.log.syslog, I have booted from a
livecd.  If needed, I can provide more log files.

Comment 9 Bill Nottingham 2007-10-19 13:29:11 EDT
If you boot from that livecd and chroot into the system, do bash, ls, etc. run?
Comment 10 Mikkel Lauritsen 2007-10-19 14:47:37 EDT
I have what might be the same problem; at least it's apparently related.

With kernel 2.6.23.1-23 on an i386 (nForce-based Asus A7N266-VM with an Athlon
XP 1800+, 1 GB ram) the boot process very quickly prints (copied by hand):

...
[Linux -initrd @ 0x37c72000, 0x37d681 bytes]

Booting: ABCDEUncompressing Linux... Ok, booting the kernel.
init[1]: segfault at 00000000 eip 00000000 esp bfbf891c error 4
init[1]: segfault at 00000000 eip 00000000 esp bfbf891c error 4
init[1]: segfault at 00000000 eip 00000000 esp bfbf891c error 4
...

The last line is printed 10 times very quickly, and then once every about 3
seconds.

The same thing happens on every boot. It's definitely the kernel and not init
that has a problem, as the originally installed kernel
(2.6.23-0.214.rc8.git2.fc8) works fine.
Comment 11 Jean-Philippe Dionne 2007-10-19 14:51:56 EDT
(In reply to comment #9)
> If you boot from that livecd and chroot into the system, do bash, ls, etc. run?

Chroot works well from the livecd:

[root@localhost ~]# mount /dev/sda3 /mnt
[root@localhost ~]# ls /mnt
bin   dev  halt  lib    lost+found  mnt  proc  sbin     srv  tmp  var
boot  etc  home  lib64  media       opt  root  selinux  sys  usr
[root@localhost ~]# chroot /mnt /bin/bash
[root@localhost /]# ls
bin   dev  halt  lib    lost+found  mnt  proc  sbin     srv  tmp  var
boot  etc  home  lib64  media       opt  root  selinux  sys  usr
[root@localhost /]# ls /mnt
[root@localhost /]# /sbin/init
Usage: init 0123456SsQqAaBbCcUu
[root@localhost /]#
Comment 12 Bill Nottingham 2007-10-19 15:11:12 EDT
OK, so:

- the fact that it happens with any init= (/sbin/init, /bin/bash) implies either
an issue with the base library set or possibly the kernel
- the fact that the same library set runs under chroot seems to implicate the kernel

Pushing there for now.
Comment 13 Chuck Ebbert 2007-10-19 15:42:22 EDT
Does specifying a statically-linked program as init work, like "init=/sbin/nash"?

And, boot the rescue disk and remove "quiet" and "rhgb" from the kernel line in
/etc/grub.conf. Then see what it prints just before the segfaults start.
Comment 14 Jean-Philippe Dionne 2007-10-19 16:47:55 EDT
(In reply to comment #13)
> Does specifying a statically-linked program as init work, like "init=/sbin/nash"?

Prints the same messages as in comment #0 and #4.  I cannot get more lines
before the segfault because it gets quickly flooded. 

> And, boot the rescue disk and remove "quiet" and "rhgb" from the kernel line in
> /etc/grub.conf. Then see what it prints just before the segfaults start.
> 

I removed the "quiet" and "rhgb" from the kernel line in the grub boot menu with
the 'e' key and nothing more is printed.

Comment 15 Bill Nottingham 2007-10-19 17:39:56 EDT
Does booting with 'vdso=0' help?
Comment 16 Chuck Ebbert 2007-10-19 17:45:37 EDT
(In reply to comment #14)
> (In reply to comment #13)
> > Does specifying a statically-linked program as init work, like
"init=/sbin/nash"?
> 
> Prints the same messages as in comment #0 and #4.  I cannot get more lines
> before the segfault because it gets quickly flooded. 
> 

Use boot_delay=500 to slow down the boot messages. Adjust the 500 as necessary
to get a reasonable slowdown.
Comment 17 Bill Nottingham 2007-10-19 21:19:15 EDT
Also, if you can tell, is this /sbin/init that is crashing, or the init from the
initrd?
Comment 18 Bill Nottingham 2007-10-19 23:07:13 EDT
OK, reproduced here - it's the initrd that's crashing. 
Comment 19 Bill Nottingham 2007-10-19 23:50:57 EDT
Jean-Philippe, Mikkel, I would suspect if you remake your initrd/initramfs, it
will boot fine.

So, what's happening on at least my test box is that the upgrade, done via
anaconda, installs new packages before removing the old packages. mkinitrd's ELF
dependency finder is pulling in the old copy of libc, instead of the new one. 
old libc + new ld-linux.so.2 == segfault.

Reassigning to mkinitrd.
Comment 20 Bill Nottingham 2007-10-19 23:55:12 EDT
Erm, reverse that. New libc, old ld-linux.so.2.
Comment 21 Bill Nottingham 2007-10-20 00:08:13 EDT
Created attachment 233291 [details]
patch for this issue

In the presence of multiple ld.so, we need to iterate over all of them,
preferring the latest, not the first. In doing so, we also need to ignore
symlinks.

Tested, this seems to DTRT.
Comment 22 Bill Nottingham 2007-10-20 00:27:22 EDT
Created attachment 233301 [details]
updated patch

Here's a fixed patch. Oops.
Comment 23 Bill Nottingham 2007-10-20 00:36:58 EDT
Built as 6.0.19-3.fc8.
Comment 24 Bill Nottingham 2007-10-22 15:03:23 EDT
*** Bug 343951 has been marked as a duplicate of this bug. ***
Comment 25 Jeremy Katz 2007-10-23 09:55:08 EDT
*** Bug 333521 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.