If you pass a long-running script to virt-builder --firstboot, systemd will kill it after a timeout. I think systemd considers the whole run of the script as 'start-up', and will timeout after TimeoutStartSec / DefaultTimeoutStartSec. libguestfs-tools-c-1.28.10-1.fc21.x86_64
We're using Type=oneshot, and systemd.service says: TimeoutStartSec= Configures the time to wait for start-up. If a daemon service does not signal start-up completion within the configured time, the service will be considered failed and will be shut down again. Takes a unit-less value in seconds, or a time span value such as "5min 20s". Pass "0" to disable the timeout logic. Defaults to DefaultTimeoutStartSec= from the manager configuration file, except when Type=oneshot is used, in which case the timeout is disabled by ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ default (see systemd-system.conf(5)). ^^^^^^^ If it's not doing that, then maybe this patch? diff --git a/customize/firstboot.ml b/customize/firstboot.ml index 29ec544..c15a8dd 100644 --- a/customize/firstboot.ml +++ b/customize/firstboot.ml @@ -92,6 +92,7 @@ ExecStart=%s/firstboot.sh start RemainAfterExit=yes StandardOutput=journal+console StandardError=inherit +TimeoutStartSec=0 [Install] WantedBy=default.target
Here is some evidence, for the record. # journalctl -l -u firstboot.service | cat -- Logs begin at Thu 2015-08-06 05:27:07 EDT, end at Thu 2015-08-06 05:42:16 EDT. -- Aug 06 05:27:10 localhost.localdomain systemd[1]: Starting virt-sysprep firstboot service... Aug 06 05:27:10 localhost.localdomain firstboot.sh[405]: /usr/lib/virt-sysprep/firstboot.sh start Aug 06 05:27:10 localhost.localdomain firstboot.sh[405]: Scripts dir: /usr/lib/virt-sysprep/scripts Aug 06 05:27:10 localhost.localdomain firstboot.sh[405]: === Running /usr/lib/virt-sysprep/scripts/0001--tmp-setup-fraw-28058 === [...] Aug 06 05:30:50 m10 firstboot.sh[405]: which x86_64 2.20-12.fc23 rawhide 46 k Aug 06 05:30:50 m10 firstboot.sh[405]: xfsprogs x86_64 3.2.4-1.fc24 rawhide 892 k Aug 06 05:30:50 m10 firstboot.sh[405]: xkeyboard-config noarch 2.15-2.fc23 rawhide 760 k Aug 06 05:30:50 m10 firstboot.sh[405]: xz x86_64 5.2.1-3.fc23 rawhide 147 k Aug 06 05:30:50 m10 firstboot.sh[405]: xz-libs x86_64 5.2.1-3.fc23 rawhide 90 k Aug 06 05:30:50 m10 firstboot.sh[405]: zlib x86_64 1.2.8-8.fc23 rawhide 95 k Aug 06 05:30:50 m10 firstboot.sh[405]: Transaction Summary Aug 06 05:30:50 m10 firstboot.sh[405]: ================================================================================ Aug 06 05:30:50 m10 firstboot.sh[405]: Install 27 Packages Aug 06 05:30:50 m10 firstboot.sh[405]: Upgrade 253 Packages Aug 06 05:30:50 m10 firstboot.sh[405]: Total download size: 204 M Aug 06 05:30:50 m10 firstboot.sh[405]: Downloading Packages: Aug 06 05:30:53 m10 systemd[1]: firstboot.service start operation timed out. Terminating. Aug 06 05:30:53 m10 systemd[1]: Failed to start firstboot.service. Aug 06 05:30:53 m10 systemd[1]: Unit firstboot.service entered failed state. Aug 06 05:30:53 m10 systemd[1]: firstboot.service failed.
Adding TimeoutStartSec=0 did not help (or I have done it wrong). I attach the script to reproduce this. My hunch right now is this: - firstboot.service gets removed early on somehow (I didn't find the code that does this.) - Some time later someone executes "systemctl daemon-reload" (as might happen during package installation). - Systemd gets confused and times out the running firstboot.service despite Type=oneshot.
Created attachment 1060248 [details] Reproducer
(In reply to Marius Vollmer from comment #4) > Created attachment 1060248 [details] > Reproducer You probably need to adjust the qemu-kvm command. The network config works for me, but it's magic as far as I am concerned.
Putting this at the top of the --firstboot script helps: ln -s /usr/lib/virt-sysprep/firstboot.service /etc/systemd/system/ The timout does not happen and the script finishes normally. (Or actually, dnf hangs, which is the thing that I wanted to provoke.)
Do you have a simple reproducer for the bug?
Ignore that - the reproducer is in comment 4.
This was my attempt at a minimal reproducer (not successfully): virt-builder fedora-22 --firstboot-command 'sleep 60; echo reloading; systemctl daemon-reload; echo reloaded; sleep 60' qemu-kvm -nodefaults -display none -hda fedora-22.img -m 1024 -serial stdio The output (lightly edited) is: [ 8.293979] firstboot.sh[397]: /usr/lib/virt-sysprep/firstboot.sh start [ 8.301537] firstboot.sh[397]: Scripts dir: /usr/lib/virt-sysprep/scripts [ 8.338522] firstboot.sh[397]: === Running /usr/lib/virt-sysprep/scripts/0001-sleep-60--echo-reloading--systemctl-daemon-reload--echo-relo === [ 68.354411] firstboot.sh[397]: reloading [ 68.420697] systemd-sysv-generator[724]: Overwriting existing symlink /run/systemd/generator.late/network.service with real service [ 68.423146] systemd-sysv-generator[724]: Overwriting existing symlink /run/systemd/generator.late/netconsole.service with real service [ 68.466432] firstboot.sh[397]: reloaded I suspect it's not daemon-reload that's causing the problem.
(In reply to Richard W.M. Jones from comment #9) > This was my attempt at a minimal reproducer (not successfully): > > virt-builder fedora-22 --firstboot-command 'sleep 60; echo reloading; > systemctl daemon-reload; echo reloaded; sleep 60' Try this: --firstboot-command 'sleep 10; echo reloading; systemctl daemon-reload; echo reloaded; sleep 300' With that I get the timeout: # systemctl status firstboot -l ● firstboot.service Loaded: not-found (Reason: No such file or directory) Active: failed (Result: timeout) since Fri 2015-08-07 05:14:45 EDT; 12s ago Main PID: 394 (code=killed, signal=TERM) Aug 07 05:13:05 localhost.localdomain systemd[1]: Starting virt-sysprep firstboot service... Aug 07 05:13:05 localhost.localdomain firstboot.sh[394]: /usr/lib/virt-sysprep/firstboot.sh start Aug 07 05:13:05 localhost.localdomain firstboot.sh[394]: Scripts dir: /usr/lib/virt-sysprep/scripts Aug 07 05:13:05 localhost.localdomain firstboot.sh[394]: === Running /usr/lib/virt-sysprep/scripts/0001-sleep-10--echo-reloading--systemctl-daemon-reload--echo-reloaded--sleep-300 === Aug 07 05:13:15 localhost.localdomain firstboot.sh[394]: reloading Aug 07 05:13:15 localhost.localdomain firstboot.sh[394]: reloaded Aug 07 05:14:45 localhost.localdomain systemd[1]: firstboot.service start operation timed out. Terminating. Aug 07 05:14:45 localhost.localdomain systemd[1]: Failed to start firstboot.service. Aug 07 05:14:45 localhost.localdomain systemd[1]: Unit firstboot.service entered failed state. Aug 07 05:14:45 localhost.localdomain systemd[1]: firstboot.service failed. The default timeout is 90 seconds and starts counting from the daemon-reload, so your "sleep 60" was too short.
I am debating with myself whether this could possibly count as a bug in systemd. Wdyt?
I get slightly different behaviour. The firstboot process seems to be killed by the reload, or else cannot produce any output. [ 8.598598] firstboot.sh[394]: === Running /usr/lib/virt-sysprep/scripts/0001-sleep-10--echo-reloading--systemctl-daemon-reload--echo-relo === [ 18.616499] firstboot.sh[394]: reloading [ 18.682547] systemd-sysv-generator[723]: Overwriting existing symlink /run/systemd/generator.late/netconsole.service with real service [ 18.688280] systemd-sysv-generator[723]: Overwriting existing symlink /run/systemd/generator.late/network.service with real service [ 18.732751] firstboot.sh[394]: reloaded (and no further output even after about 10 minutes) With: $ virt-builder fedora-22 \ --firstboot-command 'sleep 10; echo reloading; systemctl daemon-reload; echo reloaded; sleep 120; echo finished' I get: [ 8.666625] firstboot.sh[397]: === Running /usr/lib/virt-sysprep/scripts/0001-sleep-10--echo-reloading--systemctl-daemon-reload--echo-relo === [ 18.682183] firstboot.sh[397]: reloading [ 18.747675] systemd-sysv-generator[724]: Overwriting existing symlink /run/systemd/generator.late/netconsole.service with real service [ 18.752127] systemd-sysv-generator[724]: Overwriting existing symlink /run/systemd/generator.late/network.service with real service [ 18.798305] firstboot.sh[397]: reloaded (and no further output even after about 5 minutes) Since we should be using identical versions of systemd (in the image) I don't think we have a clear, reliable reproducer yet.
(In reply to Richard W.M. Jones from comment #12) > I get slightly different behaviour. The firstboot process seems to be > killed by the reload, or else cannot produce any output. I think we are seeing the same things, actually. Please log into the VM and execute # systemctl status firstboot -l to monitor the state of firstboot.service. This will show the output of firstboot, plus everything that systemd has to say about it, such as: Aug 07 05:14:45 localhost.localdomain systemd[1]: firstboot.service start operation timed out. Terminating.
(In reply to Marius Vollmer from comment #13) > This will show the output of firstboot, plus everything that systemd has to > say about it, [...] To clarify this further, the messages from systemd about firstboot.service do not show up on the console during boot, but the output of firstboot.service does. I think that explains the difference of what you see vs what I have copy/pasted.
I agree, and I think this is a systemd bug.
The unit file is: ----- [Unit] Description=virt-sysprep firstboot service After=network.target Before=prefdm.service [Service] Type=oneshot ExecStart=/usr/lib/virt-sysprep/firstboot.sh start RemainAfterExit=yes StandardOutput=journal+console StandardError=inherit [Install] WantedBy=default.target ----- The exec'd script calls dnf, and package installed by dnf calls systemd daemon-reload. According to my understanding, the Type=oneshot script should not time out, but it seems to in this case. I observed this with: systemd-219-13.fc22.i686
> The exec'd script calls dnf, and package installed by dnf calls > systemd daemon-reload. systemctl daemon-reload
(In reply to Richard W.M. Jones from comment #15) > I agree, and I think this is a systemd bug. One important piece of the puzzle is that the firstboot.service unit file is removed before "systemctl daemon-reload". As mentioned in comment 6, preventing the removal of the unit file will stop the timeout from triggering, as expected.
As far as I know, the firstboot unit file is *not* removed. Or at least, in the tests I have been conducting it is not removed. $ virt-builder fedora-22 --firstboot-command 'echo hello' --root-password password:123456 $ qemu-kvm -nodefaults -display none -hda fedora-22.img -m 1024 -serial stdio [ guest boots ] ... [ 9.343888] firstboot.sh[395]: /usr/lib/virt-sysprep/firstboot.sh start [ 9.352713] firstboot.sh[395]: Scripts dir: /usr/lib/virt-sysprep/scripts [ 9.390065] firstboot.sh[395]: === Running /usr/lib/virt-sysprep/scripts/0001-echo-hello === [ 9.395240] firstboot.sh[395]: hello Fedora release 22 (Twenty Two) Kernel 4.0.4-301.fc22.x86_64 on an x86_64 (ttyS0) localhost login: root Password: [root@localhost ~]# ls -l /usr/lib/virt-sysprep/ total 8 -rw-r--r--. 1 root root 276 Aug 11 15:25 firstboot.service -rwxr-xr-x. 1 root root 1099 Aug 11 15:25 firstboot.sh drwxr-xr-x. 2 root root 6 Aug 11 15:25 scripts drwxr-xr-x. 2 root root 6 Aug 11 15:25 scripts-done [root@localhost ~]# ls -l /etc/systemd/system/default.target.wants/ total 0 lrwxrwxrwx. 1 root root 39 Aug 11 15:25 firstboot.service -> /usr/lib/virt-sysprep/firstboot.service
(In reply to Richard W.M. Jones from comment #19) > As far as I know, the firstboot unit file is *not* removed. (Aha! I think we are getting closer.) The files and the symlink are indeed still there, but systemd will considers it "not-found" after a daemon-reload. So the core of this issue seems to be that the symlink in /etc/systemd/system/default.target.wants/ does not point into any of the directories that systemd looks into for unit files. Systemd seems to treat this inconsistently: On boot, the unit file is loaded, on daemon-reload, it is not. $ virt-builder fedora-22 --firstboot-command 'echo hello' --root-password password:123456 $ qemu-kvm -nodefaults -display none -hda fedora-22.img -m 1024 -serial stdio [ guest boots ] localhost login: root Password: [root@localhost ~]# systemctl status firstboot ● firstboot.service - virt-sysprep firstboot service Loaded: loaded (/usr/lib/virt-sysprep/firstboot.service) [...] [root@localhost ~]# systemctl daemon-reload [root@localhost ~]# systemctl status firstboot ● firstboot.service Loaded: not-found (Reason: No such file or directory) [...]
That unit file should reside in /usr/lib/systemd/system/ directory not /usr/lib/virt-sysprep/ and simlinks should not be followed outside any of the checked directories for unit files. If does that it's a bug. Any particular reason you are placing that unit file in /usr/lib/virt-sysprep/ ?
(In reply to Jóhann B. Guðmundsson from comment #21) > That unit file should reside in /usr/lib/systemd/system/ directory not > /usr/lib/virt-sysprep/ and simlinks should not be followed outside any of > the checked directories for unit files. If does that it's a bug. It seems like it is (some of the time) following such a symlink. > Any particular reason you are placing that unit file in > /usr/lib/virt-sysprep/ ? Not especially. I'm going to move it to see if that helps.
I pushed this which fixes the problem from the libguestfs p.o.v: https://github.com/libguestfs/libguestfs/commit/f3a7cf994b5c7350c32fd6011c4224d81f7ee86a However there may still be a systemd issue here: it seems to handle the unit file inconsistently, recognizing it at boot but then not recognizing it after systemctl daemon-reload.
Could be the systemd implementation in dracut doing some voodoo or shipping some old fedora firstboot unit or script etc. Plenty of reason why that might work ( it should not read firstboot from /usr/lib/virt-sysprep/ directory ) Now for another question why aren't you using the systemd built in firstboot feature ( man systemd-firstboot ) rather then shipping your own as in is it not better we tried to fix any limitation systemd firsboot might have instead of having a semi functional re-implementation of it somewhere else? What are you doing which it does not do?
"firstboot" in this context really means "next boot" (especially for virt-customize). Firstboot in systemd is really the first boot, which is not what we want. But yes having a "only once at next boot" implementation in systemd would be useful.
Hmm so if systemd can count "boots" and systemd-firstboot could be told to run on spesific boot like boot nr.2 or nr.4 etc ( or after next reboot ) then that would solve that problem? What are you doing in the "second" boot that you cannot accomplish in the first one? ( relates to if that additional functionality belongs natively in machinectl instead )
If there was a way to get the current boot nr from the filesystem (without running some program) then yes. virt-customize and virt-builder allow you to install arbitrary scripts to run on first boot (really next boot in the case of virt-customize). This works on Windows too. http://libguestfs.org/virt-builder.1.html#first-boot-scripts
Should be fixed with current git.
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.