Created attachment 522366 [details]
serial console output with systemd.log_level=debug
Description of problem:
If firstboot is set to run when booting into "runlevel 3", the boot process hangs without much in the way of output.
While firstboot isn't installed for a minimal install, this would affect users that do a graphical install but end up booting into text-mode before completing firstboot.
Version-Release number of selected component (if applicable):
I am able to reproduce this every time I'm booting into multi-user mode and firstboot is set to run.
Steps to Reproduce:
1. Do a fresh install that pulls in firstboot (default graphical works)
2. On the first reboot after install, add "3" to the end of the kernel command line
Boot process hangs
Go into text-mode firstboot or at least, complete boot process.
I've attached output from serial console with the following changes to the default kernel command line
- removed rhgb, quiet
- added systemd.log_level=debug systemd.log_target=kmsg \
This is a little bit of a stretch, but proposing as Fedora 16 Beta blocker under the following alpha release criterion :
When booting a system installed without a graphical environment, or when using a correct configuration setting to cause an installed system to boot in non-graphical mode, the system should boot to a state where it is possible to log in through at least one of the default virtual consoles
Yes, that criterion does specify that the system doesn't have a graphical environment installed but I don't think that booting into multi-user mode before firstboot is run should prevent the system from booting.
Discussed at the 2011-09-09 blocker review meeting, accepted as a blocker under the above criterion. Note that it's not a stretch: you can install without a graphical environment but with firstboot. Tim needs to re-test with TC2 and confirm, though.
Tim: any news on the re-test? I'll check this too.
Lennart, Chris, Martin: any thoughts on a fix for this? Only two days till Beta RC1 time.
seems to be valid in TC2. I did a 'minimal' install with the sole change of adding the 'firstboot' package to the loadout, and it's sitting here at the boot progress bar, not doin' anything.
seems like if you drop 'rhgb quiet' from the boot parameters, you can see the firstboot screen, but as soon as you hit a key, you get the 'progress bar' screen again, and it just gets stuck there. definitely irritating, and definitely looks like systemd.
ccing Johann in case he has ideas.
16.3-1, which I found in https://bugzilla.redhat.com/show_bug.cgi?id=734306 , appears to fix this. So we just need to pull that in for RC1.
I tested this yesterday and got different results.
The /usr/bin/setup screen appeared, and I could move the cursor around the items, but pressing Enter on anything didn't work. Even not on Quit, so I got stuck too.
If you're right and 16.3-1 fixes this, than it was plymouth's fault. I had the same problem with the graphical firstboot before. In 16.3-1 I quit plymouth manually before running firstboot-text.
I tested by booting to rescue mode, disabling the firstboot-text service, booting back to normal mode, doing 'yum --enablerepo=updates-testing update firstboot', enabling the firstboot-text service again, and rebooting. On the reboot it performed perfectly.
I tried a new graphical install (beta TC2 media, skipped evolution to work around dependency issues) and made sure that firstboot-16.3-1 was installed.
I'm still seeing the same issue, though. Booting without rhgb, quiet and adding systemd.log_level=debug systemd.log_target=kmsg plymouth:debug 3
When I disable firstboot-text.service, I am able to boot without any problems - same as before.
Huh. I will try some more tests, I guess. mgracik, can you look at it again too, and see if 16.3-1 fixes it for you?
so, I've been testing with these two kickstarts:
both do a fully automated installation from TC2 DVD, with the 'minimal' package set plus firstboot. -old just uses the TC2 DVD repo, and gets firstboot-16.1-2.fc16 . -test adds a special repo I set up which contains nothing but firstboot-16.3-1.fc16 , and makes sure to pull firstboot from that repo.
I've verified the only diff between the package sets of the two installs is:
--- /tmp/pkgs-with-repo 2011-09-14 14:44:08.351425318 -0700
+++ /tmp/pkgs-without-repo 2011-09-14 15:02:29.377951434 -0700
@@ -98,7 +98,7 @@
for me, installing with -old.ks reproduces the bug and I cannot work around it in any way other than to boot to 'single' mode and disable the service. Dropping 'rhgb' from the kernel parameters causes what Martin saw - you get to see the firstboot tool, but plymouth spews startup messages over it, and as soon as you press a key, you switch to the progress bar screen, where it gets stuck.
Installing with -test.ks, at my last test, made it work right out of the box.
Next stop: similar test but with a 'full' package set, as Tim's been testing.
welp. definitely ain't fixed in rc1 :/ don't know what's going on here, but I can't actually get an install of rc1 with firstboot-old.ks to boot *at all* - not by taking out rhgb quiet, not even into 'single' mode. so I can't boot the install at all. which is...bad.
okay, i had to take out rhgb and quiet and add single and enforcing=0, and it booted, i could disable firstboot-text.service, reboot, and got a normal boot. definitely busted.
So I tried this myself, I installed Fedora 16 Beta TC2. Didn't work. I added the ExecStartPre=-/bin/plymouth quit to the firstboot-text.service (this is what the new version got) and restarted, and firstboot worked.
I still got 2-3 lines of some strange output over the firstboot screen, but pressing Ctrl+L cleared the screen and firstboot worked as it should.
I don't know what service writes over the firstboot screen, so I don't know how to fix that thing.
mgrac: best just try it with RC1, the behaviour changed (got worse) for me. this could be down to the faster kernel changing timings, it's certainly a racy thing.
Adding CCs for halfline and lennart, as there are systemd and plymouth interactions to be concerned with here.
Yes. The same service file that works in tc2, doesn't work in rc1 at all. If I remove the ExecStartPre=-/bin/plymouth quit, it gets stuck. If I put it in, the system boots, I get a login screen, but there was no firstboot screen at all, even though I can see the service is starting. And systemctl status says everything went fine.
The systemctl status firstboot-text.service command after the boot, says that ExecStart=/usr/bin/setup was code=killed, signal=PIPE, that does not sound good.
worth noting that tflink had trouble even in tc2, but it does seem to be worse in rc1. I think there may be a timing element to this bug, and rc1 has a non-debug kernel, which changes boot timings quite substantially.
setting back to assigned, as this is clearly not fixed.
Discussed in the 2011-09-16 blocker review meeting. This seems to have gotten worse from TC2 to RC1 and seems to involve systemd/plymouth interactions.
If a fix isn't available for RC2, disable firstboot-text.service for beta and move this to a final blocker.
firstboot-16.4-1.fc16 has been submitted as an update for Fedora 16.
I tested this with fresh F16 minimal + firstboot installs using both firstboot-16.4.1.fc16 and firstboot-16.3.1.
Rebooting after install, the machine with firstboot-16.4.1 boots to a login screen and 16.3.1 doesn't
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing firstboot-16.4-1.fc16'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
*** Bug 730926 has been marked as a duplicate of this bug. ***
I tested the update with a pre-RC2 image I built here and it worked - first-time boot to runlevel 3 post install did not show any kind of firstboot at all. Will confirm with the official RC2 images.
firstboot-16.4-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
reproduced with F16 beta rc3 with partition options use all space and use free space and minimal installation.
The following message displayed after 'localhost login:' hint.
ALSA sound/pci/hda/hda_intel.c:748 azx_get_response timeout, switching to polling mode: last cmd=0x000f0000
why do you think this has anything with firstboot?
Could you provide some more information about what you're experiencing?
after installation with minimal package adn partition use all space or use free space, at the firstboot, the following message display on the terminal:
localhost login: ALSA sound/pci/hda/hda_intel.c:748 azx_get_response timeout, switching to
polling mode: last cmd=0x000f0000
after I hit the Enter key, it will login as the ALSA sound/pci/hda/hda_intel.c:748 azx_get_response timeout, switching to
polling mode: last cmd=0x000f0000
after failed, the message disappeared, root can login successfully.
This does not happen with default partition.
This is a bugzilla for the firstboot application. You are experiencing some errors during the "first" booting process of your installed system that have nothing to do with the actual firstboot application.
You need to create a new bugzilla and assign it to some other component. Maybe ALSA? But I'm not really sure.
I just re-verified this for Fedora 16 beta RC3 using an x86_64 netinstall iso and autopart full disk.
Hongqing, I don't think that the issues you're seeing are from firstboot. Maybe from one of the dependencies that it pulls in?
Either way, could you re-file that issue as another bug? I'm re-closing this one.
hongqing: if you have a 'login:' prompt then it's working. you can login with it. the ALSA message is just a warning which is being sent to console when it probably ought to be logged.