Bug 336161

Summary: init segfault at firstboot
Product: [Fedora] Fedora Reporter: Jean-Philippe Dionne <jp>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 8CC: chris.sorisio, jeff, notting, pjones, renard
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 6.0.19-3.fc8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-20 04:36:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 235703    
Attachments:
Description Flags
upgrade.log of fc6 to f8test3
none
upgrade.log.syslog of fc6 to f8test3
none
patch for this issue
none
updated patch none

Description Jean-Philippe Dionne 2007-10-17 13:46:59 UTC
Description of problem:

After an upgrade installation from Fedora Core 6 to Fedora 8 test 3 (x86_64), at
the first boot, an infinite loop with a segfault from init is printed:

init[1]: segfault at 0000000000000009 rip 0000003c3540cebe rsp 00007fff516bef90
error 4

The installation from scratch works fine. 

Hardware platform: Sun Ultra 20, Dual Core AMD Opteron(tm) Processor 180

How reproducible: always

Steps to Reproduce:
1. Upgrade from Fedora core 6 to Fedora 8
2. First boot after the installation

Comment 1 Bill Nottingham 2007-10-18 04:22:33 UTC
Nice. When during the boot process does this happen?

Comment 2 Jean-Philippe Dionne 2007-10-18 13:39:16 UTC
I can't say exactly when because the screen is flooded by printk messages about
the init segfault. What I can say is that it happens quickly after the kernel
initialization and before the /etc/rc scripts.  

The problem is the same with the 'single' kernel option on.



Comment 3 Bill Nottingham 2007-10-18 14:41:34 UTC
Does:

a) booting with init=/bin/bash work?
b) booting with selinux=0 work?

Comment 4 Jean-Philippe Dionne 2007-10-19 12:41:39 UTC
I have tested by starting from a new installation of fc6 (default packages from
the fc6 release, no yum update) then upgraded to f8test3 and the problem is the
same.  

The last line before the init segfault is:
"Freeing unused kernel memory ..."
"Write protecting the kernel read only data: 1036k"

a) booting with init=/bin/bash work? No
b) booting with selinux=0 work? No

One more problem that might be related is that the fc6 kernel is the default
choice in grub (which does not boot of course).  That grub entry should be
removed...


Comment 5 Jean-Philippe Dionne 2007-10-19 12:57:07 UTC
Created attachment 232491 [details]
upgrade.log of fc6 to f8test3

Comment 6 Jean-Philippe Dionne 2007-10-19 12:58:06 UTC
Created attachment 232501 [details]
upgrade.log.syslog of fc6 to f8test3

Comment 7 Bill Nottingham 2007-10-19 16:36:31 UTC
How does booting with init=/bin/bash fail - what sort of errors?

Comment 8 Jean-Philippe Dionne 2007-10-19 17:08:26 UTC
Exactly the same error as shown in my first comment.  In fact, init=/anything
shows the same error.

To be able to access upgrade.log and upgrade.log.syslog, I have booted from a
livecd.  If needed, I can provide more log files.



Comment 9 Bill Nottingham 2007-10-19 17:29:11 UTC
If you boot from that livecd and chroot into the system, do bash, ls, etc. run?

Comment 10 Mikkel Lauritsen 2007-10-19 18:47:37 UTC
I have what might be the same problem; at least it's apparently related.

With kernel 2.6.23.1-23 on an i386 (nForce-based Asus A7N266-VM with an Athlon
XP 1800+, 1 GB ram) the boot process very quickly prints (copied by hand):

...
[Linux -initrd @ 0x37c72000, 0x37d681 bytes]

Booting: ABCDEUncompressing Linux... Ok, booting the kernel.
init[1]: segfault at 00000000 eip 00000000 esp bfbf891c error 4
init[1]: segfault at 00000000 eip 00000000 esp bfbf891c error 4
init[1]: segfault at 00000000 eip 00000000 esp bfbf891c error 4
...

The last line is printed 10 times very quickly, and then once every about 3
seconds.

The same thing happens on every boot. It's definitely the kernel and not init
that has a problem, as the originally installed kernel
(2.6.23-0.214.rc8.git2.fc8) works fine.

Comment 11 Jean-Philippe Dionne 2007-10-19 18:51:56 UTC
(In reply to comment #9)
> If you boot from that livecd and chroot into the system, do bash, ls, etc. run?

Chroot works well from the livecd:

[root@localhost ~]# mount /dev/sda3 /mnt
[root@localhost ~]# ls /mnt
bin   dev  halt  lib    lost+found  mnt  proc  sbin     srv  tmp  var
boot  etc  home  lib64  media       opt  root  selinux  sys  usr
[root@localhost ~]# chroot /mnt /bin/bash
[root@localhost /]# ls
bin   dev  halt  lib    lost+found  mnt  proc  sbin     srv  tmp  var
boot  etc  home  lib64  media       opt  root  selinux  sys  usr
[root@localhost /]# ls /mnt
[root@localhost /]# /sbin/init
Usage: init 0123456SsQqAaBbCcUu
[root@localhost /]#

Comment 12 Bill Nottingham 2007-10-19 19:11:12 UTC
OK, so:

- the fact that it happens with any init= (/sbin/init, /bin/bash) implies either
an issue with the base library set or possibly the kernel
- the fact that the same library set runs under chroot seems to implicate the kernel

Pushing there for now.

Comment 13 Chuck Ebbert 2007-10-19 19:42:22 UTC
Does specifying a statically-linked program as init work, like "init=/sbin/nash"?

And, boot the rescue disk and remove "quiet" and "rhgb" from the kernel line in
/etc/grub.conf. Then see what it prints just before the segfaults start.


Comment 14 Jean-Philippe Dionne 2007-10-19 20:47:55 UTC
(In reply to comment #13)
> Does specifying a statically-linked program as init work, like "init=/sbin/nash"?

Prints the same messages as in comment #0 and #4.  I cannot get more lines
before the segfault because it gets quickly flooded. 

> And, boot the rescue disk and remove "quiet" and "rhgb" from the kernel line in
> /etc/grub.conf. Then see what it prints just before the segfaults start.
> 

I removed the "quiet" and "rhgb" from the kernel line in the grub boot menu with
the 'e' key and nothing more is printed.



Comment 15 Bill Nottingham 2007-10-19 21:39:56 UTC
Does booting with 'vdso=0' help?

Comment 16 Chuck Ebbert 2007-10-19 21:45:37 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Does specifying a statically-linked program as init work, like
"init=/sbin/nash"?
> 
> Prints the same messages as in comment #0 and #4.  I cannot get more lines
> before the segfault because it gets quickly flooded. 
> 

Use boot_delay=500 to slow down the boot messages. Adjust the 500 as necessary
to get a reasonable slowdown.


Comment 17 Bill Nottingham 2007-10-20 01:19:15 UTC
Also, if you can tell, is this /sbin/init that is crashing, or the init from the
initrd?

Comment 18 Bill Nottingham 2007-10-20 03:07:13 UTC
OK, reproduced here - it's the initrd that's crashing. 

Comment 19 Bill Nottingham 2007-10-20 03:50:57 UTC
Jean-Philippe, Mikkel, I would suspect if you remake your initrd/initramfs, it
will boot fine.

So, what's happening on at least my test box is that the upgrade, done via
anaconda, installs new packages before removing the old packages. mkinitrd's ELF
dependency finder is pulling in the old copy of libc, instead of the new one. 
old libc + new ld-linux.so.2 == segfault.

Reassigning to mkinitrd.

Comment 20 Bill Nottingham 2007-10-20 03:55:12 UTC
Erm, reverse that. New libc, old ld-linux.so.2.

Comment 21 Bill Nottingham 2007-10-20 04:08:13 UTC
Created attachment 233291 [details]
patch for this issue

In the presence of multiple ld.so, we need to iterate over all of them,
preferring the latest, not the first. In doing so, we also need to ignore
symlinks.

Tested, this seems to DTRT.

Comment 22 Bill Nottingham 2007-10-20 04:27:22 UTC
Created attachment 233301 [details]
updated patch

Here's a fixed patch. Oops.

Comment 23 Bill Nottingham 2007-10-20 04:36:58 UTC
Built as 6.0.19-3.fc8.

Comment 24 Bill Nottingham 2007-10-22 19:03:23 UTC
*** Bug 343951 has been marked as a duplicate of this bug. ***

Comment 25 Jeremy Katz 2007-10-23 13:55:08 UTC
*** Bug 333521 has been marked as a duplicate of this bug. ***