Bug 622058 - fails to run firstboot on first boot of an installed system
Summary: fails to run firstboot on first boot of an installed system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: firstboot
Version: 14
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Martin Gracik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 623559 (view as bug list)
Depends On:
Blocks: F14Alpha, F14AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2010-08-06 22:19 UTC by Adam Williamson
Modified: 2013-07-04 12:51 UTC (History)
11 users (show)

Fixed In Version: firstboot-1.112-3.fc14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-11 00:28:01 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
errors from /var/log/messages on boot (29.43 KB, image/png)
2010-08-06 22:47 UTC, Adam Williamson
no flags Details
[PATCH] firstboot: add systemd service (3.71 KB, text/plain)
2010-08-08 10:59 UTC, Michal Schmidt
no flags Details
working firstboot build (.src.rpm) (173.76 KB, application/octet-stream)
2010-08-10 20:02 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2010-08-06 22:19:06 UTC
Testing a local live build which has systemd 6-2 and dbus-1.3.2-0.1.885483.fc14 , I can successfully boot the composed live image and do an install to hard disk. However, on booting the installed system, I wind up straight at gdm; firstboot doesn't show up, so no user creation step occurs and the system is effectively unusable. I can switch to a vt and manually invoke firstboot and run it, but on quit it leaves the system stuck at a black screen. There are some errors from systemd related to firstboot in the logs, I'll try and recreate the issue and provide those soon.

Comment 1 Adam Williamson 2010-08-06 22:19:21 UTC

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 2 Adam Williamson 2010-08-06 22:19:59 UTC
note this is not the same as https://bugzilla.redhat.com/show_bug.cgi?id=614538 , it's a new issue. blocks alpha under criterion "In most cases, the installed system must boot to a functional graphical environment (see Blocker_Bug_FAQ)".



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Adam Williamson 2010-08-06 22:46:27 UTC
attaching a screenshot of the errors, from within my test vm.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 4 Adam Williamson 2010-08-06 22:47:10 UTC
Created attachment 437277 [details]
errors from /var/log/messages on boot

Comment 5 Adam Williamson 2010-08-08 01:20:44 UTC
Investigating this a bit further, firstboot uses the result of the 'runlevel' command to decide whether to run in text or graphical mode, and the output of 'runlevel' on the installed system in question is 'unknown', which can't be helping. (I notice when I try and shut down the system in this situation that firstboot shows up, in text mode, behind the shutdown process - though it doesn't seem to run *before* I try and shut down).

Given that with systemd the runlevel concept doesn't really apply any more, firstboot should probably grow code to first check the systemd target or something, before trying to check the runlevel. But even that would be 'wrong', given that we can have any number of arbitrary systemd targets, really. Maybe it should check whether prefdm is enabled in the currently active systemd target. CCing the firstboot maintainers. Lennart, it may be useful for you to suggest the best way forward here.

(I'm going to hack up firstboot a bit to always try and start up in graphical mode, and see if that 'fixes' it).



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 6 Adam Williamson 2010-08-08 01:24:58 UTC
so yeah, seems like there's two problems here: somehow, with systemd, firstboot triggers on system *shutdown*, not startup, and with the problem with 'runlevel', it runs in text mode, not graphical mode (the code actually handles the case where the result of runlevel is 'unknown'; it treats it as runlevel 3). I'm not sure why the thing with it kicking in on shutdown happens.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 7 Adam Williamson 2010-08-08 01:33:01 UTC
after the change to force it to run in graphical mode, as far as I can tell, on the next boot, it seems like the system wound up running firstboot's X on vt6 and the regular desktop on vt7, simultaneously. switching to vt6 gave me a black screen with an X cursor for a while, then X apparently crashed and it went to being a normal tty.

does our current firstboot system rely on having a more-or-less serial init, so once the firstboot service kicks in, the regular desktop won't start up until firstboot is done, and this gets broken by systemd launching things in parallel?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 8 Michal Schmidt 2010-08-08 10:59:28 UTC
Created attachment 437426 [details]
[PATCH] firstboot: add systemd service

Steps to reproduce:
  chkconfig firstboot on
  rm /etc/sysconfig/firstboot
  reboot

firstboot must be treated specially:
 - it interacts with the user, so it must not timeout
 - it must delay the start of prefdm until it's finished
=> it needs a native systemd service file.

The attached patch adds it. It works for me.

Comment 9 Michal Schmidt 2010-08-08 11:07:56 UTC
scratch build with the patch applied:
http://koji.fedoraproject.org/koji/taskinfo?taskID=2388036

Comment 10 He Rui 2010-08-09 10:08:27 UTC
I can reproduce it after F14-alpha-rc2-x86_64 virt-install. Firstboot didn't show up automatically. Had to turn to tty2 to start it manually.

Comment 11 Lennart Poettering 2010-08-09 10:42:07 UTC
Patch looks pretty good. one more addition:

You might want to add "Before=prefdm.service" in the [Unit] section to make sure it is started before prefdm.service.

Comment 12 Adam Williamson 2010-08-09 14:50:51 UTC
I haven't tested it yet, but AFAICT, it doesn't address the runlevel issue. If that's not addressed, we'll get the text 'firstboot' (which is a lot less useful than the graphical one), which is not desired. Michal, Lennart - did either of you check that out?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 13 Michal Schmidt 2010-08-09 15:12:15 UTC
The service file has:
  Environment=RUNLEVEL=5
firstboot uses the "runlevel" command to find out the active runlevel and the command accepts the value of this environment variable as an override.
So we'll get firstboot in graphical mode.

Comment 14 Lennart Poettering 2010-08-09 15:29:35 UTC
So, the trick is to have two firstboot services: firstboot-text.service and firstboot.graphical.service, the first being pulled in (via a .wants/ link) by multi-user.target the other by graphical.target. The first should have RUNLEVEL=3 the other RUNLEVEL=5 in an env var. Then, firstboot-graphical.service should "conflict" firstboot-text.service, to make sure that only one of them is enabled. In the systemd version in git in such a case the unit that has the "conflicts" will then win over the one that is "conflicted by", and everything should work as intended. 

I will upload a new version of systemd shortly that has the mentioned "Who wins a conflict?" tweak added. The current version in f14 would arbitrarily remove one of the two conflicting services, giving the admin little control which one is removed in case of conflict.

Comment 15 Adam Williamson 2010-08-09 15:36:58 UTC
that sounds like practically speaking it ought to work in all cases I can think of, yes.

lennart, when you submit this new version of systemd as an update for f14, can you include the updated dbus from the system 5-2 submission too? the 6-1 and 6-2 submissions didn't have that dbus package, so we might have pulled the fixed systemd but not the fixed dbus...thanks!



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Lennart Poettering 2010-08-09 16:02:16 UTC
Adam, I already posted a new ticket in bodhi for this 1h ago or so.

Comment 17 Adam Williamson 2010-08-09 16:32:48 UTC
really? when I search systemd in bodhi, it just takes me straight to https://admin.fedoraproject.org/updates/systemd-6-2.fc14 ...



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 18 Adam Williamson 2010-08-09 16:52:56 UTC
just tested a live image with the above firstboot scratch build, it still ends up with firstboot and gdm running together (gdm on vt1, firstboot on vt6). I will try doing my own build with Lennart's "Before=prefdm.service" suggestion.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 19 Adam Williamson 2010-08-09 18:02:43 UTC
Heh. If I add Lennart's suggested fix, then firstboot gets run when booting to the *live* environment...:)

there's a command in the %post of the live CD generation kickstart to disable firstboot:

# turn off firstboot for livecd boots
chkconfig --level 345 firstboot off 2>/dev/null

I guess we need a systemd equivalent there, now we're using a systemd unit file. I think just adding 'systemctl disable firstboot.service' should do the job (and for Lennart's proposed modification, make it two commands to cover the two services, obviously). Note there's several others in a similar vein here, so if those get converted into systemd native format we'd have to add commands for those too. We should probably stick a comment in the kickstart file to that effect.

I'll test another live compose with this proposed fix...



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 20 James Laska 2010-08-09 18:09:38 UTC
mgracik - looks like we need your input here.  Firstboot and systemd need to play nice together for F-14-Alpha.  We have very little time remaining to fix, build and release on time.  

Michael Schmidt has posted a first draft patch for review (see attachment#437426 [details]).  Any input you have is appreciated!

Comment 21 Adam Williamson 2010-08-09 18:31:11 UTC
Note the first draft patch is not sufficient. It needs to be adjusted as per Lennart's comments #11 and #14. I can try to provide a patch that implements those.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 22 Adam Williamson 2010-08-09 19:51:40 UTC
Okay, so I'm still having problems.

Following on from comment #19, I tried my proposal there - this change to fedora-live-base.ks from spin-kickstarts:

------------------------

@@ -180,8 +180,13 @@ fi
 action "Adding live user" useradd \$USERADDARGS -c "Live System User" liveuser
 passwd -d liveuser > /dev/null
 
+# For all the below, please be aware that if these services are made systemd
+# native, a systemctl disable command should be added to / replace the
+# chkconfig command
+
 # turn off firstboot for livecd boots
 chkconfig --level 345 firstboot off 2>/dev/null
+systemctl disable firstboot.service 2>/dev/null
 
 # don't start yum-updatesd for livecd boots
 chkconfig --level 345 yum-updatesd off 2>/dev/null

------------------------

However, it doesn't exactly work. Here's what happens. When I build a live image with this change, and with the proposed new systemd firstboot service, the boot dumps me to a text console. No firstboot, no gdm.

Here's my reasoning as to *why* this happens. The %post section from live spin kickstarts actually gets run *during boot of the built live image* - not at any point during creation of the live image. So we have, to some degree, a race going on: 'systemctl disable firstboot.service' is being run early in boot, right around the time systemd is starting stuff up.

If I log in to a VT in the affected live image and run 'systemctl status firstboot.service', the output includes:

Active: activating (start)
CGroup: name=systemd:/systemd-1/firstboot.service
         1287 sd:exec

this is far fewer processes than you see for the same command when the firstboot service is actually 'working', and running firstboot. I reckon I'm hitting a race condition here: the "systemctl disable firstboot.service" operation is getting run right in the middle of systemd trying to start the firstboot service. What 'systemctl disable foobar.service' actually does, according to the manpage, is remove all the symlinks which systemd uses to run the service. So I think systemd gets stuck, with the service forever - as far as it knows - in the middle of initialization, but with the symlinks pulled out from underneath it so it can't actually complete starting it up. Since we need to set firstboot.service to block the prefdm service running until it's complete, and firstboot.service is forever stuck, the prefdm service never gets started, so gdm never starts up.

Presumably this worked with upstart because it's more serial; by the time upstart had got to the point where it would have started the firstboot service, the command to disable it had already been run. The race existed but was always won, easily, by the 'chkconfig' command used to disable the service, so everything worked.

I'm not sure there's an easy way to fix this. I think it needs a re-think of how we accomplish the ultimate goal - have firstboot not run when you boot to the live environment, but run when you first boot into a system installed from the live environment. I'm not entirely sure what would be the best way to achieve this, though. Have the firstboot systemd service be able to detect whether it's running in a live environment or not?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 23 Adam Williamson 2010-08-09 19:53:41 UTC
the thought occurs that if the %post operations get run as part of a service on boot, we can just adjust the firstboot service to not run until that's complete. I'm not sure they do get run that way, though. I'll investigate.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 24 Adam Williamson 2010-08-09 20:04:42 UTC
okay, update: the %post stuff is actually run during live image creation, but what I missed is that the commands I saw aren't actually *run* at %post, but are stuck into a /etc/init.d/livesys file on the live image - i.e. it's creating a service, owned by no package, that gets run at boot. ew. still, that's good for us: we can make firstboot not run until this livesys service has run, and that should solve the race. will test that.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 25 Adam Williamson 2010-08-09 21:01:36 UTC
damn. still doesn't work, but I can't figure out why.

so, here's what I've done in total:

to spin-kickstarts fedora-live-base.ks:

-----------

@@ -180,8 +180,13 @@ fi
 action "Adding live user" useradd \$USERADDARGS -c "Live System User" liveuser
 passwd -d liveuser > /dev/null
 
+# For all the below, please be aware that if these services are made systemd
+# native, a systemctl disable command should be added to / replace the
+# chkconfig command
+
 # turn off firstboot for livecd boots
 chkconfig --level 345 firstboot off 2>/dev/null
+systemctl disable firstboot.service 2>/dev/null
 
 # don't start yum-updatesd for livecd boots
 chkconfig --level 345 yum-updatesd off 2>/dev/null

------------

to firstboot.service from michal's scratch build:

------------

@@ -1,5 +1,7 @@
 [Unit]
 Description=firstboot configuration program
+After=livesys.service
+Before=prefdm.service
 
 [Service]
 Environment=RUNLEVEL=5

------------

somehow, building a live image with these changes, firstboot gets run at startup (I don't get the weird race condition). I've no idea why this is. It should be disabled, but it gets run - by systemd - just as if the systemctl disable command never happened. The systemctl disable command definitely wound up in /etc/init.d/livesys , and that script definitely ran (as the liveuser account is created, and all the other stuff it does happened).

We could adjust the fedora-live-base.ks a bit to give some feedback so we can tell for sure when it kicks in and if it works. Lennart, did I miss a bit the command would need to work? Does it need a --system or a --global?

sorry, I have to go out golfing now, but I'll poke this more when I get back.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 26 Lennart Poettering 2010-08-10 02:10:51 UTC
A new systemd version (v7) that ensures that if two conflicting jobs are scheduled the one that has the "conflicts" is kept while the one that is "conflicted" is removed is now in bodhi.

Comment 27 Adam Williamson 2010-08-10 05:30:39 UTC
still can't make it work. :/

systemctl disable firstboot.service should have an implied systemctl daemon-reload , but if I add a systemctl daemon-reload command right after the systemctl disable firstboot.service command along with all the other above tweaks, I get back into the race condition with prefdm not starting and firstboot stuck at 'activating'. I really don't know why it don't work, but it don't.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 28 Lennart Poettering 2010-08-10 11:00:20 UTC
Hmm, do you have an image or so I can test this with?

Or alternatively, can you get me the output of "dmesg" when bootet with "systemd.log_target=kmsg systemd.log_level=debug" on the kernel cmdline?

Comment 29 Lennart Poettering 2010-08-10 11:12:34 UTC
Adam, if you add "--no-reload" to the "systemctl disable" line, do things work for you then? If so, that might be a good temporary fix and I'll look into fixing this properly later on.

Comment 30 Adam Williamson 2010-08-10 14:55:08 UTC
testing.

I thought of an alternative workaround, too: we could simply have the firstboot service check if it's running live (parse cmdline for 'livesys') and not run the firstboot app if it is.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 31 Jared Smith 2010-08-10 17:41:24 UTC
I'm starting to be concerned that this bug (and a couple of others on the blockers list) are going to cause us to slip the schedule.  It's very unclear to me whether we've found the root problem in this bug, of if we're just looking for workarounds.  Is there anything I can do to help facilitate things here?

Comment 32 James Laska 2010-08-10 18:08:47 UTC
(In reply to comment #31)
> I'm starting to be concerned that this bug (and a couple of others on the
> blockers list) are going to cause us to slip the schedule.  It's very unclear
> to me whether we've found the root problem in this bug, of if we're just
> looking for workarounds.  Is there anything I can do to help facilitate things
> here?    

Things are very dynamic with this issue at the moment.  Adam, Martin and Lennart have met and developed a strategy on IRC.  However, time is short, and in order to gain confidence that the Alpha has met release criteria, 1) this issue, 2) Additional remaining OPEN F14Alpha blockers, and 3) the test matrices must be completed in time for the go/no-go meeting scheduled for tomorrow.  We can talk more about general readiness on the list (or irc).

Comment 33 Adam Williamson 2010-08-10 18:54:56 UTC
jared: we know what the problem is, and I'm testing a fixed package now. The definition of whether the fix is a 'fix' or a 'workaround' is kind of just a semantic argument - the existing implementation is already something of a hack, and it's arguable whether it's less 'hacky' to successfully do the same thing with systemd or just do it a different, but in a way more robust, way. (Which is the current approach).

Bottom line, we should have this behaving as it ought to for Alpha. Whether the way that behaviour is implemented is the best possible way is debatable, but then prior to systemd the same was true.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 34 Adam Williamson 2010-08-10 20:01:37 UTC
okay, I think I've nailed this. Here's a firstboot package which, with systemd 7-1 and the appropriate changes to spin-kickstarts, produces the correct results.

nirik has committed the spin-kickstarts change, and lennart has a pending update request for the necessary systemd 7-1.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 35 Adam Williamson 2010-08-10 20:02:51 UTC
Created attachment 437987 [details]
working firstboot build (.src.rpm)

Comment 36 Adam Williamson 2010-08-10 20:03:43 UTC

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 37 Jared Smith 2010-08-10 20:47:12 UTC
Thanks for the update, Adam.  I appreciate it!

Comment 38 Fedora Update System 2010-08-10 22:01:27 UTC
firstboot-1.112-1.fc14 has been submitted as an update for Fedora 14.
http://admin.fedoraproject.org/updates/firstboot-1.112-1.fc14

Comment 39 Fedora Update System 2010-08-11 00:13:31 UTC
firstboot-1.112-2.fc14 has been submitted as an update for Fedora 14.
http://admin.fedoraproject.org/updates/firstboot-1.112-2.fc14

Comment 40 Fedora Update System 2010-08-11 00:27:55 UTC
firstboot-1.112-2.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Fedora Update System 2010-08-11 01:44:24 UTC
firstboot-1.112-3.fc14 has been submitted as an update for Fedora 14.
http://admin.fedoraproject.org/updates/firstboot-1.112-3.fc14

Comment 42 Fedora Update System 2010-08-11 01:57:59 UTC
firstboot-1.112-3.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 43 Adam Williamson 2010-09-08 15:15:10 UTC
*** Bug 623559 has been marked as a duplicate of this bug. ***

Comment 44 Fedora Update System 2010-09-10 13:14:20 UTC
firstboot-1.113-1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/firstboot-1.113-1.fc14

Comment 45 Fedora Update System 2010-09-19 18:05:31 UTC
firstboot-1.113-4.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/firstboot-1.113-4.fc14


Note You need to log in before you can comment on or make changes to this bug.