Red Hat Bugzilla – Bug 470174
anaconda loads (& updates) w/o error but Reboot fails
Last modified: 2008-12-11 21:08:38 EST
Description of problem: Upon completion of installing OS without error, the called-for REBOOT fails with the screen message (abbreviated below):
Session finished... exiting logger
***An error occurred during the filesystem check.
***Dropping you to a shell...
Give password for maintenance...
While in maintenance after crash, fsck shows all filesystems to report OK. There are ten total, 4 on IDE sda, 6 on IDE sdb, 1 /home on SCSI sdc. The platform is dual-boot, F9 on sda2, F10 on sda3.
Because the above is preceded with a "clear screen" the script output leading up to the failure is lost. I also did not find one byte of logged status (did I miss the instruction to turn it ON?). Isolating the error took many retries with experimental changes...:-(
After commenting out the only SCSI drive from /etc/fstab the reboot completed. This particular drive has been installed for years successfully up thru FC-9. The failure began with FC-10Beta & FC-10Preview. NOTE: The SCSI bios flags a potential problem with the default format [255 heads, 64 sectors] that has been present and *not* caused problems in previous years on this machine.
Version-Release number of selected component (if applicable):
Motherboard: Tyan Thunder K7X w/ dual AMD
SCSI adapter: Tekram DC-390F [seen as: RIVA: TNT2]
Drive: IBM Ultra-160 DDYS-T18350 (18GB)
How reproducible: Seen after 3 installs / hard failure every attempt.
Steps to Reproduce:
1. Install fresh OS from net (also applies to upgrade; Beta-to-Preview)
Additional info: I don't know this (yet) but I suspect a driver software change for F10 is less tolerant of the SCSI layout flaw on this machine. I will test this. If I am correct, the underlying problem is mine. I ask for a display improvement to help locate the issue.
I'm sorry -- the failure to mount my SCSI drive is not related to the faulted layout. After adjusting the drive format, the SCSI bios no longer flags the error, but F10 still fails to mount the sdc1 filesystem. Further, the "noauto" parm does not circumvent the failure thus mounting is not the root cause. The only workaround I have is to not place any reference to sdc in /etc/fstab. After booting, the mount command does work and the filesystem functions normally.
Therefore I must revise this report from one to two failures:
1) The booting process aborts when a SCSI drive defined as sdc1 is in fstab.
2) The troubleshooting process is hampered by hiding boot progress output.
The good news is that logs are now showing up for successful boots, the bad news is that my approach to this issue *may have* severely messed up the "first boot" completion for the OS build. I believe a fresh build without SCSI is next just to get a clean one.
Further experience [I hope my last post?]: This issue has also shown up in F9 when booting with kernel parameter "single". The work-around I use for both F9 and F10 is to "#" out any filesystem on SCSI in fstab and also add a "mount ..." statement in /etc/rc.d/rc.local and this works without known problem. I confirm my experience is only related to my SCSI hardware, and not related to IDE interface drives [I have no SATA or remote drives].
Upon again scanning bugzilla for similarities, I have interest in bugs 388901, 431778, 464636, 429937. For me, the connection is perhaps a timer-race in the early boot process when root filesystems are read-only and unmounted where fsck does not wait long enough for a reply from SCSI(+) hardware? Yes -- this is a WAG. I am confident that anaconda is NOT the component presenting the problem.
I have also attempted to revise the severity from low to moderate in view of finding similar problems on bugzilla.
When you get to the error prompt, does the device node exist? Can you attach your fstab?
Created attachment 323392 [details]
/etc/fstab for F10 modified for work-around
The work-around includes a "mount ..." statement in /etc/rc.d/rc.local to accomplish what the "# " removes...
Can you also attach your /etc/blkid/blkid.tab file? Which label in fstab corresponds to your /dev/sdc1?
fstab is now attached, but I don't have enough information to answer the device node question at the moment of error. Certainly /dev/sdc1 exists at successful boot completion.
The error clears the screen of error message text, however IF the simple shell the boot process drops into after the error can answer your question, then I need to reproduce the failure and look specifically. Remember - in this shell when I run "... fsck" it successfully completes checking all filesystems including sdc1 therefore it seems your answer should be "yes".
Created attachment 323393 [details]
blkid for F10
The LABEL=homeF10 consumes the total space on /dev/sdc which is the SCSI drive.
Removing 'rhgb quiet' from the boot commandline should give a better idea of what the error is.
OK -- I booted three times...
After removing the work-around I booted as per the default [original description] and the result was just as discussed above. It's repeatable.
The second boot, I removed "quiet" from the kernel and surely a lot of message text whizzed by more quickly than I could comprehend. I did see a "waiting 10 second" line, but I don't know what it was waiting for. The process ended exactly as above.
The third boot, I removed "rhgb quiet" and again much happened more quickly than I can relate to you. This time however, it cleared the screen and placed the debug message on top saying:
plymouthd: ply-boot-splash.c:283: ply_boot_splash_root_mounted: Assertion 'splash !=((void *)0)' failed
This does not sound helpful...:-( This sounds like the routine used to report the error detail itself failed?
It would be real "handy" if the screen did not refresh when reporting this error so I could see what led up to the failure. There is no record of the boot progress in /var/log/messages up to and including the failure I'm seeing. If I can activate break points it might be handy? What next?
What next is getting simpler. I have just completed two "yum update" cycles, the first after installation of F10P so the first was rather long. NOTE: I got one signiture warning during update, I did choose "yes" rather than the safe "no" answer... The booting process now reports "Fedora 10" rather than 9.93 and the good news is that the first problem is solved -- I now see the text leading up to the failure.
Two messages stand out at boot time prior to failure:
Could not detect Stabilization, waiting 10 seconds.
This is the top line of the screen, and immediately precedes "Welcome to Fedora". I suspect it *may* be significant.
fsck.ext3: Unable to resolve 'LABEL=homeF10'
This confirms the SCSI drive fails to pass the "Check filesystems" test prior to mounting /root. I previously tested three approaches to defining this filesystem in fstab: LABEL..., UUID..., and /dev/sdc1, but all failed in the same way. Therefore it appears that the second problem remains in the most recent distribution. It also suggests to me the root cause lies within plymouth or fsck.ext3. I can report that the workaround above still works well to mount homeF10 during boot F10 time.
This failure has changed today using the latest F10 [22.214.171.124-117.fc10.i686]. There are no updates available from rawhide at this moment. This failure still happens as before, but now there are additional start-up lines below the failure text as reported in comment #10 and the original posting. There are 8 new lines all beginning with:
sd 2:0:0:0 [sdc] ...
showing response from the initialization process and the last line reports:
sd 2:0:0:0 [sdc] Attached scsi generic sg3 type 0
Because all this is AFTER the [FAILED] line, it seems the hardware process is just too slow during the initial fsck check of file systems while /root is read-only.
I have in fact several similar boot-time failures:
1) the SCSI of this report
2) network fails to init eth1
3) httpd failes to start
In these cases I have added "mount ..." and "service ... restart" lines to rc.local as a work-around for the failures, and they are successful. Because of the similarity in all these failures to initialize, I suspect plymouth may be the root cause, and the assignment to anaconda team is no longer appropriate.. ?
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
For my host, this bug has been retested today shows as fixed. The repair happened after 11/21 and before 12/11 with kernel 126.96.36.199-134 or before.
I'll put this on CLOSED status since I seem to be the only poster...
My thanks to somebody...