Created attachment 459240 [details] Console log of endlessly looping dracut Description of problem: I have issues booting the stock Fedora 14 kernel on some of my systems that were recently updated from F13 to F14. Dracut seems to enter some endless (?) loop... Version-Release number of selected component (if applicable): dracut-006-3.fc14.noarch (but I'm not sure if really dracut is to blame) How reproducible: Always :-( Steps to Reproduce: 1. Try to boot a stock Fedora 14 kernel. 2. 3. Actual results: See boot log below. Expected results: Normal boot up. Additional info: Running a vanilla mainline kernel with an initramfs image created using mkinitrd works fine. See console log below.
Created attachment 459271 [details] Console log of other system with similar issue: i386 now.
Eventually the boot completes - at least on another, simpler system I have here. Differences to previous example: - old i386 [Pentium II (Deschutes)] instead of x86_64 - much simpler setup: no RAID, no LVM, just a plain HDD See new console log (console-2.log)
wow... I have never seen something like this! "Running a vanilla mainline kernel with an initramfs image created using mkinitrd works fine." mkinitrd is just a wrapper, which calls dracut, unless you really used the Fedora 11 mkinitrd.
(In reply to comment #0) > Dracut seems to enter some endless (?) loop... Why dracut? The loop happens after [ 15.136116] dracut: Switching root
The working image (Linux 2.6.35.5 based) was built before the update to Fedora 14, i. e. when still running Fedora 13 (with all updates installed): # lsinitrd initramfs-2.6.35.img | grep dracut dracut-005-3.fc13 -rw-r--r-- 1 root root 5167 Apr 20 2010 lib/dracut-lib.sh -rw-r--r-- 1 root root 18 Apr 20 2010 dracut-005-3.fc13
If you do 'sysrq -p' or 'sysrq -t', what's it doing?
Please excuse the probably stupid question - but how would I do that? When using the normal console + keyboard, I can press the SysRq key, but all output scrolls out so quickly I cannot read / log it. And when using the serial console I'm connected through some terminal server, and I have no idea how to send a break... Your notation indicates that 'sysrq -p' or 'sysrq -t' might be commands I could enter somehow?
Ugh. OK, that may not be practical. Does booting with 'init=/bin/bash' work?
Yes, this works nicely: ... [ 11.276497] dracut: Found volume group "triton" using metadata type lvm2 [ 12.224845] dracut: 8 logical volume(s) in volume group "triton" now active [ 12.270732] dracut: Autoassembling MD Raid [ 12.342339] SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled [ 12.397166] SGI XFS Quota Management subsystem [ 12.493997] XFS mounting filesystem dm-0 [ 12.698150] dracut: Remounting /dev/disk/by-uuid/774674dc-7af1-4cde-ac60-df73445e8adb with -o noatime,ro [ 12.775213] XFS mounting filesystem dm-0 [ 12.843524] dracut: Mounted root filesystem /dev/mapper/triton-root [ 12.961893] dracut: Loading SELinux policy [ 13.268275] SELinux: Disabled at runtime. [ 13.294079] type=1404 audit(1289456481.319:2): selinux=0 auid=4294967295 ses=4294967295 [ 13.431188] dracut: /sbin/load_policy: Can't load policy file /etc/selinux/targeted/policy/policy.15: No such file or directory [ 13.565372] dracut: Switching root bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell bash-4.1# bash-4.1# bash-4.1#
OK. If you edit /etc/rc.sysinit and add '-x' to the bash invocation, where does it hang?
Created attachment 459820 [details] Console log with bash -x in /etc/rc.sysinit ; x86_64
Created attachment 459821 [details] Console log with bash -x in /etc/rc.sysinit ; i386
When I add '-x' to /etc/rc.sysinit the behaviour changes: the systems (both x86_64 and i386) do not come up any more at all, instead they hang hard. See new console logs.
Moving to kernel... it's locking up when starting udev, so that usually means some driver.
Good point. Especially since that problem goes immediately and completely away when I boot either a Fedora 13 or a plain vanilla mainline kernel. Hm... your comment started me thinking again. I have another issue on one of these systems (on the i386 one): that's bug 652005. So I removed support for the serial console from the grub.conf file (both the entries for grub: < serial --port=0xBC00 --speed=19200 --word=8 --parity=no --stop=1 < terminal --timeout=10 serial console and the "console=ttyS0,19200" boot argument, and ... ...voila: both systems boot fine. Seems there is an issue with the serial driver? I'm using an old terminal server, which does not support higher baud rates than 19200 bps. Maybe the driver has not sufficiently been tested with serial console at low baud rates?
How can we go on from here? Normally I would now run "git bisect", but the problem is only with the Fedora kernels. It does not happen with vanilla mainline kernels - I tried 2.6.35, 2.6.36, and 2.6.38. Or is there somewhere a git repository for the Fedora kernel with all the patches that get applied during build? (I can't imagine, given the fact that there is a number of conditionals in the spec file, but I might be wrong?)
Things become more complicated. I noticed that even mainline kernels don't work any more on these boards; I could not reproduce my earlier results. Then I realized that I'm building the ramdisk images in the new environment. The problem is not with the kernel - the very same kernel image works fine with the old ramdisk image (built under Fedora 13), but fails with a ramdisk image built under Fedora 14. I downgraded dracut to the same version as was used before on the Fedora 13 box ( Also, I confirm that the issue reported in bug 652005 behaves the same. Finally, I have an detail to add. The console logs in failure mode contain a large number of 'ESC % G' sequences. Originally I though this was some noise caused by my terminal server, but now I realize this might be an important detail. I'll attach 3 new files: 1) the unfiltered console log of a vanilla 2.6.35.5 kernel booting with a ramdisk image built under Fedora 14; 2) the ramdisk image that cases the problems, built under Fedora 14; and 3) the old ramdisk image that works fine, built under Fedora 13.
Created attachment 460405 [details] Unfiltered console log of error case,showing the ESC-%-G sequences.
Created attachment 460406 [details] Ramdisk image built under Fedora 14, fails
Created attachment 460407 [details] Ramdisk image build under Fedora 13, works
can you boot without "rhgb" or even with "rd_NO_PLYMOUTH"
(In reply to comment #18) > Created attachment 460405 [details] > Unfiltered console log of error case,showing the ESC-%-G sequences. This is most likely the plymouth bootsplash showing the progress bar.
I have not been using "rhgb" all the time. And indeed, using "rd_NO_PLYMOUTH" makes all problems go away. So it's recent versions of plymouth that causes all that trouble with serial consoles!?! Thanks!
I have not been using "rhgb" all the time. And indeed, using "rd_NO_PLYMOUTH" makes all problems go away. So it's recent versions of plymouth that cause all that trouble with serial consoles!?! Thanks!
CC'ing someone who I recall dealing with a serial hang in the recent past, and the plymouth maintainer.
Here is a movie of the serial console going wild during a reboot (Note this is a on a bare medal machine): http://duffy.fedorapeople.org/temp/consoleissue/102_0114.MOV Then once the machine is up, the console is useless because characters are not being echoed, as this movie shows: http://duffy.fedorapeople.org/temp/consoleissue/102_0115.MOV
I am seeing this problem as well, and rd_NO_PLYMOUTH also solves it for me.
I too can confirm adding rd_NO_PLYMOUTH to the boot line solves the problem,
I am seeing symptoms similar to the initial error but after adding the appropriate debug I found that the BusLogic module was not being loaded. With the debug console configured I can do a "modprobe BusLogic" and then exit, to allow the the lvm to find my disks and finish the boot. This happen at some point while I was running FC13 and ever upgrading to FC14 it is still broken. I have ried many things at this point with no easy fix. Right now if I boot FC14 with the debug console for dracut I can finish the boot. The only kernel that boot with no intervention is 2.6.34.7-56 from FC13 the 2.6.35.10-74 for FC14 will only do it if I do the manual modprobe. How it got this way is a question and with all the the "fixes" I have tried at this point I am not sure where I discovered the issue. My initial issue was it could not find the root for LVM to work and then I discovered the driver for the disk was not there. This configuration is now in a VM with VMWare fusion so I can back it up and retry different things. I was attempting to force the kernel module BusLogic.ko to load but I do not know why it stopped or if doing so will kill something later.
Upgraded to latest dracut: dracut-generic-006-6.fc14.noarch dracut-tools-006-6.fc14.noarch dracut-network-006-6.fc14.noarch dracut-006-6.fc14.noarch dracut-fips-006-6.fc14.noarch rebuilt initramfs: [root@gmillerLinux recover]# cat rebuild_initramfs.sh echo "backing up current /boot/initramfs-$(uname -r).img starting" mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-backup.img echo "backing up current /boot/initramfs-$(uname -r).img complete" echo "rebuild of current /boot/initramfs-$(uname -r).img starting" dracut /boot/initramfs-$(uname -r).img $(uname -r) echo "rebuild of initramfs /boot/initramfs-$(uname -r).img complete" [root@gmillerLinux recover]# booted machine and the system fails to find the device root
Created attachment 474747 [details] dmesg output during boot Here is the contents of dmesg that I see during the boot failure that drops me to the debug command prompt. Also the output of the "modprobe BusLogic" needed to get my drives recognized, and then the output after "exiting" the debug command prompt and the completion of the boot.
(In reply to comment #32) > Created attachment 474747 [details] > dmesg output during boot > > Here is the contents of dmesg that I see during the boot failure that drops me > to the debug command prompt. Also the output of the "modprobe BusLogic" needed > to get my drives recognized, and then the output after "exiting" the debug > command prompt and the completion of the boot. seems like the BusLogic driver does not support module autoloading, and has to be fixed... you might want to add "rdloaddriver=BusLogic" to the kernel command line.
This stopped working after kernel 2.6.34.7-56.fc13 so I thought it was something else other than driver change but I will try what suggest.
The "rdloaddriver=BusLogic" makes not difference so I am open to other suggestions. :-( Is there a special way to reference the driver "BusLogic.ko" or "scsi/BusLogic.ko" ??
(In reply to comment #35) > The "rdloaddriver=BusLogic" makes not difference so I am open to other > suggestions. :-( Is there a special way to reference the driver "BusLogic.ko" > or "scsi/BusLogic.ko" ?? Is the driver included in the initramfs image at all? # lsinitrd /boot/initramfs-<kernel version>.img | grep BusLogic
yes, if I do the "modprobe BusLogic" I can load it by hand and then continue the boot. Is the syntax of the driver name something other than "BusLogic" ?
"rdloaddriver=BusLogic" on the kernel command line should really work! We should really open a new bug for this...
Let's move this problem to bug 679758
This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping