Bug 552434 - Boot cycle Fedora 12 indeterminate boot loop, new install with old fis mounted in /etc/fstab, fails to load libfreebl3.so
Summary: Boot cycle Fedora 12 indeterminate boot loop, new install with old fis mounte...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: plymouth
Version: 14
Hardware: i386
OS: Linux
low
medium
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL: https://www.redhat.com/archives/fedor...
Whiteboard:
: 560448 (view as bug list)
Depends On: 561544
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-05 02:51 UTC by Joel Rees
Modified: 2013-01-13 11:52 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Fedora 12 on AMD 32 bit sempron
Last Closed: 2012-08-16 20:06:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 561544 0 high CLOSED nss-softokn-freebl library needs to be in /lib{64} 2021-02-22 00:41:40 UTC

Description Joel Rees 2010-01-05 02:51:47 UTC
Description of problem:
When I add pieces of the old file system from my previous install to /etc/fstab and try to reboot, the system goes into an indeterminate reboot loop.

Version-Release number of selected component (if applicable):


How reproducible:
Very, at least for this install. (Haven't tried other installs, non-LVM, etc.)

Context: Two pATA hard disks, 80G and 160G, multiboot, currently only Fedora.

80G disk set as the primary boot disk, contains 
sdb1: (set as primary boot) 20G ext3 partition with F10, updated from F9, IIRC.
sdb2: 1.5 G partition flagged as fat16, used in the past as Solaris swap
sdb3: 42G ext3 partition currently containing backup of old system user data
sdb4: 17G MSDOS extended
sdb5: 6G FAT32 used for passing files between different OSses
sdb6: 1.5G linux swap

160G disk set as secondary boot, but booting through grub on sdb1, contains
sda1: (set as secondary boot) 4G ext3 root of current F12 system
sda2: 48G lvm containing /bin, /usr, et. al. from old F10 (all ext3)
sda3: 52G (lvm) containing /bin, /usr, et. al. of current F12 (all ext4)
sda4: 59G ext3 miscellaneous data

** I should note that parted (command line) reports sda2 as lvm flagged, but does not report sda3 as lvm flagged. (I haven't hexdumped the map, so I haven't checked what the actual flag values are at this point.)

Steps to Reproduce:

1. I formatted sdb3 ext3 and backed up my /home partition and /etc there.

2. When installing F12, using disk druid, I specified sda1 for formatting ext3 and installing root, and specified sda3 as formatting LVM, cut into /bin, et. al., all formatted ext4.

3. Reboot works okay.

4. Added the partitions of the old system to /etc/fstab and rebooted.

Actual results:
Reboot cycles in an indeterminate loop. Sometimes it will "catch" and boot, but it usually takes at least three tries. Any old file system I attempt to mount in fstab will cause the cycle, certain old file systems will boot more times than I have patience to watch.

If I can successfuly boot, I can mount the old file systems by hand without problems.

This is what I get on screen before the reboot is forced:
--------------------------------
/dev/mapper/fc7-7{various}: clean {long list of partitions from the old system, ending with}
/dev/mapper/fc7-7varwww: clean, 644/516896 files, 27268/516896 blocks
                                                                [failed]

*** An error occured during the file system check,
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
*** Warning -- SELinux is active
*** Disabling security enforcement for system recovery.
*** Run 'setenforce 1' to reenable.

sulogin: error while loading shared libraries: libfreebl3.so: cannot open shared
 object file: No such file or directory
Unmounting file system
Automatic reboot in progress.
-------------------------------

I tried commenting out the last old lvm volume in /etc/fstab, with no change except what file system was last reported clean. Then I commented all old lvm volumes out and the error message changed
------------------------------
fsck.ext3: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb4

Could this be a zero-length partition?
------------------------------

With both sdb4 and sdb5 commented out, it changed to this:
------------------------------
fsck.ext3: Devic or resource busy while trying to open /dev/sda3
Filesystem mounted or opened exclusively by another program?
------------------------------

and then the notice about attempting to drop me to a shell, followed by the forced reboot when libfreebl3.so fails to load. This is the one where the cycle appears to repeat as long as I have patience to watch it repeat.

I have not tried commenting out sdb3 and leaving the rest in. I'll try that and report back.

Expected results:
Basic booting with the old file system accessible under /used .
If that fails, I would expect it to successfully drop me to shell so I can edit /etc/fstab without having to either wait for a successful boot or boot my rescue system on the other drive. (I suppose that, with end-user systems, just casually dropping to shell would not be good either, but this is Fedora, and endlessly rebooting is not particularly correct for end-user systems either.

Additional info:
/etc/fstab looks like this:
#
# /etc/fstab
# Created by anaconda on Sat Nov 28 21:16:03 2009
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or vol_id(8) for more info
#
UUID=[elided for security purposes] /                       ext3    defaults        1 1
/dev/mapper/vg_f11-f11tmp /tmp                    ext4    defaults        1 2
/dev/mapper/vg_f11-f11var /var                    ext4    defaults        1 2
/dev/mapper/vg_f11-f11usr /usr                    ext4    defaults        1 2
/dev/mapper/vg_f11-f11home /home                   ext4    defaults        1 2
/dev/mapper/vg_f11-f11varlog /var/log                ext4    defaults        1 2
/dev/mapper/vg_f11-f11vartmp /var/tmp                ext4    defaults        1 2
/dev/mapper/vg_f11-f11usrlocal /usr/local              ext4    defaults        1 2
/dev/mapper/vg_f11-f11varftp /var/ftp                ext4    defaults        1 2
/dev/mapper/vg_f11-f11varwww /var/www                ext4    defaults        1 2
/dev/mapper/fc7-7swap   swap                    swap    defaults        0 0
/dev/mapper/vg_f11-f11swap swap                    swap    defaults        0 0
/dev/sdb6               swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  defaults        0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0

# extras
#/dev/sdb4		/pig			ext3	defaults	1 2
#/dev/sda5		/fat			vfat	defaults	0 0

#backups of the old system
#/dev/sda3		/used/bk		ext3	noauto,defaults	1 2
#/dev/mapper/fc7-7home	/used/home		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7tmp	/used/tmp		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7usr	/used/usr		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7usrlocal	/used/usr/local		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7var	/used/var		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7varftp	/used/var/ftp		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7varlog	/used/var/log		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7vartmp	/used/var/tmp		ext3	noauto,debug,errors=continue,defaults	1 0
#/dev/mapper/fc7-7varwww	/used/var/www		ext3	noauto,debug,errors=continue,defaults	1 0

Comment 1 Joel Rees 2010-01-05 03:43:22 UTC
Okay, pardon me while I pull my handkerchief out and wipe the egg from my face.

I had sda and sdb partially switched around in /etc/fstab. Correcting those lines allows the boot process to proceed as it should.

However, it still should not drop me into an endless reboot cycle when it can't load libfreebl3.so. So I will not close this bug myself.

And I'm wondering how things could be fixed so that these boot errors could be logged to /var/log/messages. I'm assuming the reason they weren't is that I have /var/log on its own partition. I suppose I should try unmounting /var/log in single-user mode to check whether messages got written in /var/log before the partition for /var/log was mounted.

Comment 2 Frank Crawford 2010-01-07 09:43:52 UTC
Okay, I just got bit by this error as well.  Different initial problem, in that I had a disk that needed manual checking.  Every time it got to the point where it needed to start sulogin, it failed, the system rebooted, and dropped back to the same point.

After cleaning the disk error with a rescue disk, further examination of the issue shows that /sbin/sulogin dynamically links /usr/lib/libfreebl3.so, which is on a separate, unmounted partition (if you want to argue about /usr being separate from / I'd take that as a separate issue).

Worse still libfreebl3 is designed to manage Network Security Services Softokens, yet at most times that sulogin will be invoked, there is no networking, hence this is a useless library to add.

Comment 3 Elio Maldonado Batiz 2010-01-07 15:41:23 UTC
nss-softokn's shared libraries, libfreebl3.so, libsoftokn3.so and libnssdbm3.so and their respective .chk files, are installed to /usr/lib{,64}. Installing them on /lib{,64} is an option.

Comment 4 Jason McBrayer 2010-01-17 19:35:16 UTC
I have this problem (different initial problem, just a failed fsck that needs manual checking).  /sbin/sulogin fails because it requires /usr/lib/libfreebl3.so, and /usr is not mounted (because of the fsck failure).  Nothing in /sbin should depend on anything in /usr/lib -- any libraries /sbin/sulogin needs need to be in /lib.

Comment 5 Petr Lautrbach 2010-02-01 08:57:43 UTC
*** Bug 560448 has been marked as a duplicate of this bug. ***

Comment 6 Elio Maldonado Batiz 2010-02-03 23:30:40 UTC
libfreebl1.so (.chk) should be moved to /lib{64} as proposed in Bug 561544.

Comment 7 Kai Engert (:kaie) (inactive account) 2010-03-23 19:32:51 UTC
Elio, did you suggest this bug will be closed when bug 561544 is closed?

Joel, is this bug fixed for you now that bug 561544 is fixed?

Comment 8 Joel Rees 2010-04-01 23:08:45 UTC
Sorry about taking so long to respond. (Same old same old, wish I had a job that allowed me more time for this.)

Here's what happened today.

First boot had some issues, splash screen froze. Maybe my disks are having problems. Arrow keys or something got me to the console, where all that was visible on the screen was 

-------------
Give root password for maintenance or press ctl-d to continue.
-------------

Any key pressed repeated the prompt, including ctl-d.

Ctrl-alt-del to re-boot and it boots okay. Great. I actually had other plans for today, but, ...

I edited /boot/grub/device.map, inverted the device (sda to sdb), wrote, and rebooted.

As expected, this triggers the problem. 

No indeterminant automatic reboot cycle, that's nice. But if I leave it to boot and don't catch it during about the middle of the splash screen, the progress bar goes to the end and the boot process freezes. Sometimes.

If I press a key before the progress bar gets to the end, I get the boot console, of course. Eventually, I get to the expected messages, can't mount for some reason, maybe device is busy, could it be a zero length partition, etc. Then,

-------------------------------
Failed, dropping you to a shell.
Give root password for maintenance or press ctl-d to continue.
-------------------------------

And, as above, you can't enter the password because it prompts you on every keystroke, and it doesn't recognize ctrl-d, either. Ctrl-alt-del and it does the same, except, after a while, some time and several reboots, couldn't say whether it was time or number of reboots, it seems to find the partition anyway, and proceeds to boot.

(So I fixed /boot/grub/device.map and re-booted and now I'll go after the updates. Wish the PPC hadn't been dropped from Fedora 13, but that's my work notebook, not this machine, different processor, etc. But it bugs me that Fedora abandoned it after Apple did. Guess I'll start loading openbsd on that one. Woops, that has nothing to do with this bug at all.)

Anyway, the auto-reboot cycle is fixed. Sort of.

The fix is not particularly satisfactory. 

I'd suggest that when dropping to a shell, or trying to, the screen should be toggled automatically to the console log. 

I'd also suggest that, when attempting to drop to a shell, a message about using ctrl-alt-del to reboot if you can't enter a password, and about using an alternative boot device (live CD?) to edit the grub device.map or fstab, or something to that effect, could be more meaningful than the zero length device or busy message.

Can't think of anything to add at this point.

Comment 9 Dirk Hoffmann 2010-08-30 11:09:22 UTC
I have the same problem with the looping "Give root password / press Ctrl-D" prompt: Any key pressed yields the prompt again.

This is more annoying for me, as I need to run fsck. (I know how to use a rescue or life CD for that, but that is not the way it should work, I suppose.)

Hence without a rescue CD in the pocket or a carefully foreseen rescue partition, the PC is unrecoverable?

I am afraid this is another bug now. I have no idea how to handle this shift of focuse inside bugzilla. But it is urgent!!!

Comment 10 Joel Rees 2010-08-30 12:29:59 UTC
Okay, it looks like booting single user (catching the grub menu and editing the kernel line to put a "1" or a "single" at the end) won't help. It looks like it will try to mount everything in /etc/fstab before it turns the machine over to you.

If it's just a timing issue, ctrl-alt-del several times from the stuck point may finally bring it to a point where the drive naming works the way you intended it to and it boots. Then you can get into /etc/fstab and comment out the line(s) that don't always match the naming algorithm in your BIOS.  

I ended up using the dev-mapper names to get around this for the LVM-managed volumes. Others, you can find the UUID and use that in /etc/fstab.

But, yeah, otherwise, you probably want a rescue CD or a live CD or a rescue partition (single partition install). Old systems usually work.

I'm too tired to make any sense, I'm afraid.

Comment 11 Bug Zapper 2010-11-04 01:50:38 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Frank Crawford 2010-11-21 10:25:49 UTC
Folks, this problem still occurs in Fedora 14, although as mentioned above, it doesn't seem to be obviously related to nss-softoken any more.

Can we get this updated to Fedora 14?

Comment 13 Dirk Hoffmann 2010-11-21 20:41:42 UTC
A natural way to work around this problem, and probably the reason why only a few people stumble over it nowadays, is to use ext4fs filesystems for formatting. (Did I mention I used ext2 for compatibility/nostalgy reasons?) A journalled filesystem has less chances to need user interaction in case of corruption.

Nevertheless, when user interaction is needed, and the mechanism is foreseen to go to SU boot mode, it should work correctly. Thanks for updating!

Comment 14 Scott Mcdermott 2010-11-22 00:00:45 UTC
ext3 has journalling.

Anyways, there's 100 ways to mess up the boot so you
have to drop to the shell.  Maybe you edited fstab and
made a typo on UUID.  So can't find, drop to shell.

The reason this bug did not affect a lot of people is
because '/' and '/usr' are the same filesystem for
them.  So they get '/usr/lib' mounted during emergency.
It was moved to '/lib' so no longer requires '/usr'
which fixes this bug.

However the login problem does still exist where it
just loops and any key yields the prompt again.  The
login for rescue is broken...

Comment 15 Frank Crawford 2010-11-22 05:40:40 UTC
Dirk, actually, I got bitten by this while trying to change to ext4 filesystems for / just last week.  Because / was modified I needed to run an fsck, but couldn't.  That wasn't itself a big deal, I just needed to pull out the rescue CD, etc.

Still as Scott mentioned, there are a number of ways to hit it.  I also once was bitten by it when doing an LVM copy and lost power in the middle, or when I had snapshots that would not remove automatically.  All these things can be worked around or have had their buggy code fixed, but they aren't just corrupt journals.

Comment 16 Dirk Hoffmann 2010-11-22 08:50:32 UTC
(In reply to comment #14 / comment #15)
> Anyways, there's 100 ways to mess up the boot ...

Please do not interpret my hint to a workaround as an argument to disregard this bug! The opposite was implied with my humble contribution: This is a very old bug (jumping from 12 to 14 today, but probably longer standing), therefore many users seeking help may stumble over this report and be happy to find _at least_ a workaround.  The situation should be embarrassing for developers/debuggers.

The classification Priority=low/Severity=medium is unsatisfactory. A longstanding bug should get higher priority in my opinion. And as this bug can completely ruin your day or week after a system crash on your laptop abroad, unless you always carry your recovery CD with you, it should be considered very serious.

Comment 17 Frank Crawford 2010-11-22 12:40:16 UTC
Okay, now for something different, I've just been doing some tests of /sbin/sulogin under Fedora 14, and the failure of sulogin is no longer related to the mounting of /usr.

I've run up two different VMs, one with the standard filesystem layout (i.e. just /boot and /) and the other with a separate /usr, and modify the kernel argument in grub to include "init=/sbin/sulogin" both fail in the same way, i.e. failing the read on every character.

So, should this issue stay open or be moved to a new report or do I have a totally different issue?

Comment 18 Joel Rees 2010-11-22 14:01:45 UTC
This behavior (not being able to enter a password to clear the loop and drop into rescue mode) has been present from my original report.

I'm thinking it's best to continue the bug, unless we want to make this thread be for the problems of trying to load a library before the file system it resides on is mounted and make a separate thread trying to handle  whtaever the real reason is for being unable to actually enter the login process when dropping to rescue mode.

I'm inclined to believe that the problem is still based in not being able to get all of the login code into memory when the bug occurs.

I don't think this bug can be fixed until the programmers are able to determine some sort of separate login process for protecting the rescue mode. You can't use the system to enforce privilege before the system is fully running.

Comment 19 Frank Crawford 2010-11-24 12:27:00 UTC
Tonight I've been doing a lot of experimentation with this issue and found out a number of things, and proved that the problem in F14 is different to that previously in F12.

To cut to the chase, the issue is now that sulogin is not able to set the tty mode, and so is in "raw" mode and reading each character as it is typed in an attempting to use that as the password.  You can easily test this by making root's password 1 character, and it kind of works (echo is off, can't be turned on, etc, but it does what you ask).

Secondly sulogin generally works okay if you can get past the tty issue, in that if you make it the default for bootlevel S, then it works fine.  But if you try and use it with the console set by during the initial kernel boot, e.g. by adding "init=/sbin/sulogin" to the grub entry, or before running Upstart's rcS.conf, the tty settings are screwed.

Finally, somehow if you give "init=/bin/bash" to grub it gets the tty settings correct and it works okay.

As a postscript, the sulogin source has additional options to initialise tty settings, but they don't seem to be configured in F14, and I haven't yet had a chance to see if that will fix the problem.  I assume that bash is doing something like that.

Comment 20 Dirk Hoffmann 2010-11-24 16:16:35 UTC
(In reply to comment #19)
> Tonight I've been doing a lot of experimentation with this issue and found out
> a number of things, and proved that the problem in F14 is different to that
> previously in F12.

After some private communication after submittal of my report, I had come to a similar conclusion indeed: The mounting of all necessary ressources and the tty mode for input in S mode are two sources of the problem "cannot log on to repair system in S mode".

I do not remember, why a single ticket was kept. But the problem you just described (tty mode) was for sure present in F12 already!

Thanks for the hint with init=/bin/bash, works like a charm!!! It says then:
 bash: cannot set terminal process group (-1): Inappropriate ioctl for device
In case that helps ...

Comment 21 Frank Crawford 2011-02-19 01:40:34 UTC
I found some time to look at this further, and compiled up the current upstream version of sulogin including the tty initialisation code, and still no luck.

I'm beginning to think that this is not really an sulogin bug, but really either a kernel or plymouth issue.  I had a look at what bash does and it pretty much replaces anything to do with the tty with its own buffering and other code, so it doesn't really care about the tty settings.

What I think is the issue here is that /dev/console that we are attached to isn't really the right one or a real tty device.  Hence it could be a kernel bug, but it may be that plymouth hasn't correctly set up the devices properly, or has passed a console it had initially but since replaced by a new bunch of system devices.

I also considered if it was upstart, but running with init=/sbin/sulogin has the same problem and upstart isn't invoked.  However, upstart does somehow fix the issue either in /etc/rc.d/rc.sysinit or later through some manipulation.

Anyway, I think it may be time to close this bug and open a new one against plymouth.

Comment 22 Dirk Hoffmann 2011-02-19 22:41:57 UTC
(In reply to comment #21)

> Anyway, I think it may be time to close this bug and open a new one against
> plymouth.

I do not know how this can be handled in the most efficient way. Open a new but or just reassign this ticket to plymouth? 

This bug has generated a lot of discussion and tests, some of them probably still useful for further investigation. How can we make sure the bug report finally goes to the right person?

Comment 23 Elio Maldonado Batiz 2011-02-19 23:49:30 UTC
The simplest solution maybe also the most efficient, reassign it to preserve the thread and with it useful information.

Comment 24 Fedora End Of Life 2012-08-16 20:07:02 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.