Bug 651585 - Boot procedure goes into endless loop
Summary: Boot procedure goes into endless loop
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: plymouth
Version: 14
Hardware: x86_64
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 679758
TreeView+ depends on / blocked
 
Reported: 2010-11-09 20:58 UTC by Wolfgang Denk
Modified: 2012-08-16 20:18 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
: 679758 (view as bug list)
Environment:
Last Closed: 2012-08-16 20:18:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Console log of endlessly looping dracut (8.81 KB, application/x-gzip)
2010-11-09 20:58 UTC, Wolfgang Denk
no flags Details
Console log of other system with similar issue: i386 now. (10.26 KB, application/x-gzip)
2010-11-09 21:44 UTC, Wolfgang Denk
no flags Details
Console log with bash -x in /etc/rc.sysinit ; x86_64 (14.61 KB, application/x-gzip)
2010-11-11 18:41 UTC, Wolfgang Denk
no flags Details
Console log with bash -x in /etc/rc.sysinit ; i386 (9.89 KB, application/x-gzip)
2010-11-11 18:42 UTC, Wolfgang Denk
no flags Details
Unfiltered console log of error case,showing the ESC-%-G sequences. (23.68 KB, application/x-gzip)
2010-11-14 20:25 UTC, Wolfgang Denk
no flags Details
Ramdisk image built under Fedora 14, fails (7.20 MB, application/octet-stream)
2010-11-14 20:28 UTC, Wolfgang Denk
no flags Details
Ramdisk image build under Fedora 13, works (7.11 MB, application/octet-stream)
2010-11-14 20:31 UTC, Wolfgang Denk
no flags Details
dmesg output during boot (120.53 KB, text/plain)
2011-01-22 17:41 UTC, Gary Miller
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 652005 0 low CLOSED Login fails on serial console: cannot enter password 2021-02-22 00:41:40 UTC

Internal Links: 652005

Description Wolfgang Denk 2010-11-09 20:58:50 UTC
Created attachment 459240 [details]
Console log of endlessly looping dracut

Description of problem:

I have issues booting the stock Fedora 14 kernel on some of my systems that
were recently updated from F13 to F14.

Dracut seems to enter some endless (?) loop...

Version-Release number of selected component (if applicable):

dracut-006-3.fc14.noarch

(but I'm not sure if really dracut is to blame)

How reproducible:

Always :-(

Steps to Reproduce:
1. Try to boot a stock Fedora 14 kernel.
2.
3.
  
Actual results:

See boot log below.

Expected results:

Normal boot up.

Additional info:

Running a vanilla mainline kernel with an initramfs image created
using mkinitrd works fine.

See console log below.

Comment 1 Wolfgang Denk 2010-11-09 21:44:36 UTC
Created attachment 459271 [details]
Console log of other system with similar issue: i386 now.

Comment 2 Wolfgang Denk 2010-11-09 21:45:09 UTC
Eventually the boot completes - at least on another, simpler system I have here.

Differences to previous example:

- old i386 [Pentium II (Deschutes)] instead of x86_64
- much simpler setup: no RAID, no LVM, just a plain HDD

See new console log (console-2.log)

Comment 3 Harald Hoyer 2010-11-10 13:05:32 UTC
wow... I have never seen something like this!

"Running a vanilla mainline kernel with an initramfs image created
using mkinitrd works fine."

mkinitrd is just a wrapper, which calls dracut, unless you really used the Fedora 11 mkinitrd.

Comment 4 Harald Hoyer 2010-11-10 13:06:44 UTC
(In reply to comment #0)
> Dracut seems to enter some endless (?) loop...

Why dracut? The loop happens after 

[   15.136116] dracut: Switching root

Comment 5 Wolfgang Denk 2010-11-10 19:05:32 UTC
The working image (Linux 2.6.35.5 based) was built before the update to Fedora 14, i. e. when still running Fedora 13 (with all updates installed):

# lsinitrd initramfs-2.6.35.img  | grep dracut
dracut-005-3.fc13
-rw-r--r--   1 root     root         5167 Apr 20  2010 lib/dracut-lib.sh
-rw-r--r--   1 root     root           18 Apr 20  2010 dracut-005-3.fc13

Comment 6 Bill Nottingham 2010-11-10 20:25:11 UTC
If you do 'sysrq -p' or 'sysrq -t', what's it doing?

Comment 7 Wolfgang Denk 2010-11-10 21:07:11 UTC
Please excuse the probably stupid question - but how would I do that?

When using the normal console + keyboard, I can press the SysRq key, but all output scrolls out so quickly I cannot read / log it.  And when using the serial console I'm connected through some terminal server, and I have no idea how to send a break...

Your notation indicates that 'sysrq -p' or 'sysrq -t' might be commands I could enter somehow?

Comment 8 Bill Nottingham 2010-11-10 22:27:06 UTC
Ugh. OK, that may not be practical.

Does booting with 'init=/bin/bash' work?

Comment 9 Wolfgang Denk 2010-11-11 06:30:10 UTC
Yes, this works nicely:

...
[   11.276497] dracut: Found volume group "triton" using metadata type lvm2
[   12.224845] dracut: 8 logical volume(s) in volume group "triton" now active
[   12.270732] dracut: Autoassembling MD Raid
[   12.342339] SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
[   12.397166] SGI XFS Quota Management subsystem
[   12.493997] XFS mounting filesystem dm-0
[   12.698150] dracut: Remounting /dev/disk/by-uuid/774674dc-7af1-4cde-ac60-df73445e8adb with -o noatime,ro
[   12.775213] XFS mounting filesystem dm-0
[   12.843524] dracut: Mounted root filesystem /dev/mapper/triton-root
[   12.961893] dracut: Loading SELinux policy
[   13.268275] SELinux:  Disabled at runtime.
[   13.294079] type=1404 audit(1289456481.319:2): selinux=0 auid=4294967295 ses=4294967295
[   13.431188] dracut: /sbin/load_policy: Can't load policy file /etc/selinux/targeted/policy/policy.15: No such file or directory
[   13.565372] dracut: Switching root
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-4.1# 
bash-4.1# 
bash-4.1#

Comment 10 Bill Nottingham 2010-11-11 14:46:14 UTC
OK. If you edit /etc/rc.sysinit and add '-x' to the bash invocation, where does it hang?

Comment 11 Wolfgang Denk 2010-11-11 18:41:30 UTC
Created attachment 459820 [details]
Console log with bash -x in /etc/rc.sysinit ; x86_64

Comment 12 Wolfgang Denk 2010-11-11 18:42:04 UTC
Created attachment 459821 [details]
Console log with bash -x in /etc/rc.sysinit ; i386

Comment 13 Wolfgang Denk 2010-11-11 18:43:29 UTC
When I add '-x' to /etc/rc.sysinit the behaviour changes: the systems (both x86_64 and i386) do not come up any more at all, instead they hang hard.
See new console logs.

Comment 14 Bill Nottingham 2010-11-11 19:42:18 UTC
Moving to kernel... it's locking up when starting udev, so that usually means some driver.

Comment 15 Wolfgang Denk 2010-11-11 20:57:56 UTC
Good point. Especially since that problem goes immediately and completely away when I boot either a Fedora 13 or a plain vanilla mainline kernel.

Hm... your comment started me thinking again. I have another issue on one of these systems (on the i386 one): that's bug 652005.

So I removed support for the serial console from the grub.conf file
(both the entries for grub:
< serial --port=0xBC00 --speed=19200 --word=8 --parity=no --stop=1
< terminal --timeout=10 serial console
and the "console=ttyS0,19200" boot argument, and ...

...voila: both systems boot fine.

Seems there is an issue with the serial driver?  I'm using an old terminal server, which does not support higher baud rates than 19200 bps.  Maybe the driver has not sufficiently been tested with serial console at low baud rates?

Comment 16 Wolfgang Denk 2010-11-12 22:20:53 UTC
How can we go on from here? Normally I would now run "git bisect", but the problem is only with the Fedora kernels. It does not happen with vanilla mainline kernels - I tried 2.6.35, 2.6.36, and 2.6.38.

Or is there somewhere a git repository for the Fedora kernel with all the patches that get applied during build? (I can't imagine, given the fact that there is a number of conditionals in the spec file, but I might be wrong?)

Comment 17 Wolfgang Denk 2010-11-14 20:23:44 UTC
Things become more complicated. I noticed that even mainline kernels don't work any more on these boards; I could not reproduce my earlier results. Then I realized that I'm building the ramdisk images in the new environment. The problem is not with the kernel - the very same kernel image works fine with the old ramdisk image (built under Fedora 13), but fails with a ramdisk image built under Fedora 14.

I downgraded dracut to the same version as was used before on the Fedora 13 box (

Also, I confirm that the issue reported in bug 652005 behaves the same.

Finally, I have an detail to add. The console logs in failure mode contain
a large number of 'ESC % G' sequences. Originally I though this was some noise
caused by my terminal server, but now I realize this might be an important detail.

I'll attach 3 new files: 1) the unfiltered console log of a vanilla 2.6.35.5 kernel booting with a ramdisk image built under Fedora 14; 2) the ramdisk image
that cases the problems, built under Fedora 14; and 3) the old ramdisk image that works fine, built under Fedora 13.

Comment 18 Wolfgang Denk 2010-11-14 20:25:43 UTC
Created attachment 460405 [details]
Unfiltered console log of error case,showing the ESC-%-G sequences.

Comment 19 Wolfgang Denk 2010-11-14 20:28:13 UTC
Created attachment 460406 [details]
Ramdisk image built under Fedora 14, fails

Comment 20 Wolfgang Denk 2010-11-14 20:31:25 UTC
Created attachment 460407 [details]
Ramdisk image build under Fedora 13, works

Comment 21 Harald Hoyer 2010-11-15 11:13:56 UTC
can you boot without "rhgb" or even with "rd_NO_PLYMOUTH"

Comment 22 Harald Hoyer 2010-11-15 11:14:59 UTC
(In reply to comment #18)
> Created attachment 460405 [details]
> Unfiltered console log of error case,showing the ESC-%-G sequences.

This is most likely the plymouth bootsplash showing the progress bar.

Comment 23 Wolfgang Denk 2010-11-15 15:36:10 UTC
I have not been using "rhgb" all the time.

And indeed, using "rd_NO_PLYMOUTH" makes all problems go away.

So it's recent versions of plymouth that causes all that trouble with serial consoles!?!

Thanks!

Comment 24 Wolfgang Denk 2010-11-15 15:36:11 UTC
I have not been using "rhgb" all the time.

And indeed, using "rd_NO_PLYMOUTH" makes all problems go away.

So it's recent versions of plymouth that cause all that trouble with serial consoles!?!

Thanks!

Comment 25 Wolfgang Denk 2010-11-15 15:36:29 UTC
I have not been using "rhgb" all the time.

And indeed, using "rd_NO_PLYMOUTH" makes all problems go away.

So it's recent versions of plymouth that cause all that trouble with serial consoles!?!

Thanks!

Comment 26 Bill Nottingham 2010-11-16 02:40:45 UTC
CC'ing someone who I recall dealing with a serial hang in the recent past, and the plymouth maintainer.

Comment 27 Steve Dickson 2010-12-02 14:36:59 UTC
Here is a movie of the serial console going wild during a
reboot (Note this is a on a bare medal machine): 
http://duffy.fedorapeople.org/temp/consoleissue/102_0114.MOV

Then once the machine is up, the console is useless because
characters are not being echoed, as this movie shows:

http://duffy.fedorapeople.org/temp/consoleissue/102_0115.MOV

Comment 28 Eric Biederman 2010-12-12 03:46:47 UTC
I am seeing this problem as well, and rd_NO_PLYMOUTH also solves it for me.

Comment 29 Steve Dickson 2010-12-13 18:36:42 UTC
I too can confirm adding rd_NO_PLYMOUTH to the boot line solves the problem,

Comment 30 Gary Miller 2011-01-09 23:08:34 UTC
I am seeing symptoms similar to the initial error but after adding the appropriate debug I found that the BusLogic module was not being loaded.  With the debug console configured I can do a "modprobe BusLogic" and then exit, to allow the the lvm to find my disks and finish the boot.  This happen at some point while I was running FC13 and ever upgrading to FC14 it is still broken.  I have ried many things at this point with no easy fix.  Right now if I boot FC14 with the debug console for dracut I can finish the boot.

The only kernel that boot with no intervention is 2.6.34.7-56 from FC13 the 2.6.35.10-74 for FC14 will only do it if I do the manual modprobe.  How it got this way is a question and with all the the "fixes" I have tried at this point I am not sure where I discovered the issue.  My initial issue was it could not find the root for LVM to work and then I discovered the driver for the disk was not there.

This configuration is now in a VM with VMWare fusion so I can back it up and retry different things.  I was attempting to force the kernel module BusLogic.ko to load but I do not know why it stopped or if doing so will kill something later.

Comment 31 Gary Miller 2011-01-22 17:36:36 UTC
Upgraded to latest dracut:

dracut-generic-006-6.fc14.noarch
dracut-tools-006-6.fc14.noarch
dracut-network-006-6.fc14.noarch
dracut-006-6.fc14.noarch
dracut-fips-006-6.fc14.noarch

rebuilt initramfs:


[root@gmillerLinux recover]# cat rebuild_initramfs.sh 
echo "backing up current /boot/initramfs-$(uname -r).img starting"
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-backup.img
echo "backing up current /boot/initramfs-$(uname -r).img complete"
echo "rebuild of current /boot/initramfs-$(uname -r).img starting"
dracut /boot/initramfs-$(uname -r).img $(uname -r)
echo "rebuild of initramfs /boot/initramfs-$(uname -r).img complete"
[root@gmillerLinux recover]# 

booted machine and the system fails to find the device root

Comment 32 Gary Miller 2011-01-22 17:41:22 UTC
Created attachment 474747 [details]
dmesg output during boot

Here is the contents of dmesg that I see during the boot failure that drops me to the debug command prompt.  Also the output of the "modprobe BusLogic" needed to get my drives recognized, and then the output after "exiting" the debug command prompt and the completion of the boot.

Comment 33 Harald Hoyer 2011-01-25 14:58:49 UTC
(In reply to comment #32)
> Created attachment 474747 [details]
> dmesg output during boot
> 
> Here is the contents of dmesg that I see during the boot failure that drops me
> to the debug command prompt.  Also the output of the "modprobe BusLogic" needed
> to get my drives recognized, and then the output after "exiting" the debug
> command prompt and the completion of the boot.

seems like the BusLogic driver does not support module autoloading, and has to be fixed... you might want to add "rdloaddriver=BusLogic" to the kernel command line.

Comment 34 Gary Miller 2011-01-25 17:00:10 UTC
This stopped working after kernel 2.6.34.7-56.fc13 so I thought it was something else other than driver change but I will try what suggest.

Comment 35 Gary Miller 2011-02-22 20:48:18 UTC
The "rdloaddriver=BusLogic" makes not difference so I am open to other suggestions. :-(  Is there a special way to reference the driver "BusLogic.ko" or "scsi/BusLogic.ko" ??

Comment 36 Harald Hoyer 2011-02-23 07:48:36 UTC
(In reply to comment #35)
> The "rdloaddriver=BusLogic" makes not difference so I am open to other
> suggestions. :-(  Is there a special way to reference the driver "BusLogic.ko"
> or "scsi/BusLogic.ko" ??

Is the driver included in the initramfs image at all?

# lsinitrd /boot/initramfs-<kernel version>.img | grep BusLogic

Comment 37 Gary Miller 2011-02-23 12:31:52 UTC
yes, if I do the "modprobe BusLogic" I can load it by hand and then continue the boot. Is the syntax of the driver name something other than "BusLogic" ?

Comment 38 Harald Hoyer 2011-02-23 12:50:23 UTC
"rdloaddriver=BusLogic" on the kernel command line should really work!

We should really open a new bug for this...

Comment 39 Harald Hoyer 2011-02-23 12:53:29 UTC
Let's move this problem to bug 679758

Comment 40 Fedora End Of Life 2012-08-16 20:18:12 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.