Bug 1135475 - some systemd services fail to start due to the livesys script (race condition)
Summary: some systemd services fail to start due to the livesys script (race condition)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: spin-kickstarts
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeroen van Meeuwen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-29 12:19 UTC by Kamil Páral
Modified: 2014-09-03 12:19 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-02 16:54:12 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
systemd debug journal from F21 Nightly 2014-08-26 - OK (693.45 KB, text/plain)
2014-08-29 12:23 UTC, Kamil Páral
no flags Details
systemd debug journal from F21 Nightly 2014-08-27 - FAIL (686.62 KB, text/plain)
2014-08-29 12:24 UTC, Kamil Páral
no flags Details

Description Kamil Páral 2014-08-29 12:19:21 UTC
Description of problem:
TL;DR version: /etc/init.d/livesys script (a part of rel-eng kickstart for creating Fedora LiveCD media) causes problems with services using PrivateTmp=true (and maybe not just them), because it remounts system tmpfs dirs during boot. This creates race conditions. Details below.

Note: This is going to be quite detailed bug report (I've already spent hours debugging this), and it's a deep rabbit hole, so please have patience with me.

With recent F21 Nightly images I've found out that spice-vdagentd sometimes worked and sometimes did not (e.g. could not transfer clipboard contents from client to host).

When I looked at the service status (when it didn't work), it looked like this:
$ systemctl status spice-vgagentd.service
● spice-vdagentd.service - Agent daemon for Spice guests
   Loaded: loaded (/usr/lib/systemd/system/spice-vdagentd.service; enabled)
   Active: failed (Result: exit-code) since Thu 2014-08-28 08:13:26 EDT; 16min ago
  Process: 792 ExecStart=/usr/sbin/spice-vdagentd $SPICE_VDAGENTD_EXTRA_ARGS (code=exited, status=226/NAMESPACE)
  Process: 727 ExecStartPre=/bin/rm -f /var/run/spice-vdagentd/spice-vdagent-sock (code=exited, status=0/SUCCESS)

Aug 28 08:13:26 localhost systemd[792]: Failed at step NAMESPACE spawning /usr/sbin/spice-vdagentd: Operation not permitted
Aug 28 08:13:26 localhost systemd[1]: spice-vdagentd.service: control process exited, code=exited status=226
Aug 28 08:13:26 localhost systemd[1]: Failed to start Agent daemon for Spice guests.
Aug 28 08:13:26 localhost systemd[1]: Unit spice-vdagentd.service entered failed state.

However, this service failed to start _only during system boot_, when I tried to start it manually, it worked perfectly. Also, if I installed the Live image, the installed system showed no such problem - the service always started during boot correctly. So, I assumed this must be something very specific to the LiveCD environment.

After working with #spice folks, we couldn't find any reason why the service should fail. After consulting #systemd folks, they told me that the error comes from systemd itself:
<zdzichu> kparal: namespaces are used for privateTmp, PrivateNetwork, ProtectHome, ProtectSystem and other mount restrictions

Indeed, spice-vdagentd.service has:
PrivateTmp=true

Here's some explanation to that:
https://fedoraproject.org/wiki/Features/ServicesPrivateTmp

So I started suspecting some livecd-specific setup which touches tmpfs dirs, but which is not transfered to the installed system. That is mainly composed of two scripts: /etc/init.d/livesys{,-late}

I tried to remove "rd.live.image" from the boot command line, and immediately the problems disappeared - the spice-vdagentd service always started correctly during livecd boot. Then I tried to modify /etc/init.d/livesys (using dracut shell, because I can't easily modify iso contents) and finally traced the problem to these two lines:

# make it so that we don't do writing to the overlay for things which
# are just tmpdirs/caches
mount -t tmpfs -o mode=0755 varcacheyum /var/cache/yum
mount -t tmpfs vartmp /var/tmp

If I commented them out, the problems stopped happening. Voila.

But that's not all. I wanted to know why it sometimes works and sometimes doesn't. So I enabled systemd debug and compared the outputs from an OK boot and a problematic boot. And consistently, the following was true:
* For an OK boot, spice-vdagentd was started (the main process started and forked) before the two lines above were executed.
* For a problematic boot, the two lines above were executed before spice-vdagentd was (unsuccessfully) tried to be started.

So as you see, there is a strong pattern - service ordering affects the result.

There is one more important thing to mention. When I say "the service sometimes failed to start correctly", I don't mean different boots of the same image. I mean different images. For some reason, even though this looks very much as a race condition, I get 100% consistent results with a particular image. But I get different results between different images. It looks like this (in my case):

F21 Nightly 2014-08-05: always broken
F21 Nightly 2014-08-25: always broken
F21 Nightly 2014-08-26: always OK
F21 Nightly 2014-08-27: always broken
F21 Nightly 2014-08-28: always OK
F21 Alpha TC4: always broken

Maybe it won't be that consistent for other people, it is for me, I have no idea why. But at least it makes things much easier to debug.

Last but not least, I'm not sure whether only spice-vdagentd.service is affected by this problem. Because theoretically, any service using PrivateTmp can be affected, depending on the boot order and timing. That would make these services potentially failing:
$ sudo grep -r PrivateTmp /usr/lib/systemd/system
/usr/lib/systemd/system/httpd.service:PrivateTmp=true
/usr/lib/systemd/system/systemd-timesyncd.service:PrivateTmp=yes
/usr/lib/systemd/system/stunnel.service:PrivateTmp=true
/usr/lib/systemd/system/cups.service:PrivateTmp=true
/usr/lib/systemd/system/spice-vdagentd.service:PrivateTmp=true
/usr/lib/systemd/system/systemd-machined.service:PrivateTmp=yes
/usr/lib/systemd/system/systemd-localed.service:PrivateTmp=yes
/usr/lib/systemd/system/systemd-timedated.service:PrivateTmp=yes
/usr/lib/systemd/system/openvpn@.service:PrivateTmp=true
/usr/lib/systemd/system/colord.service:PrivateTmp=yes
/usr/lib/systemd/system/systemd-hostnamed.service:PrivateTmp=yes
/usr/lib/systemd/system/rtkit-daemon.service:PrivateTmp=yes

Also, some other services (using other than PrivateTmp features) might be affected in certain cases, but I can't speak to that, systemd folks probably can.



Version-Release number of selected component (if applicable):
Fedora-Live-Workstation-x86_64-21-20140828.iso
https://kojipkgs.fedoraproject.org//work/tasks/4117/7474117/Fedora-Live-Workstation-x86_64-21-20140828.iso
Fedora-Live-Workstation-x86_64-21-20140827.iso
https://kojipkgs.fedoraproject.org//work/tasks/6453/7466453/Fedora-Live-Workstation-x86_64-21-20140827.iso
Fedora-Live-Workstation-x86_64-21-20140826.iso
https://kojipkgs.fedoraproject.org//work/tasks/2425/7452425/Fedora-Live-Workstation-x86_64-21-20140826.iso
Fedora-Live-Workstation-x86_64-21-Alpha-TC4.iso
https://kojipkgs.fedoraproject.org//work/tasks/8318/7448318/Fedora-Live-Workstation-x86_64-21-Alpha-TC4.iso
Fedora-Live-Workstation-x86_64-21-20140825.iso
https://kojipkgs.fedoraproject.org//work/tasks/7152/7447152/Fedora-Live-Workstation-x86_64-21-20140825.iso
Fedora-Live-Workstation-x86_64-21-20140824.iso
https://kojipkgs.fedoraproject.org//work/tasks/4637/7444637/Fedora-Live-Workstation-x86_64-21-20140824.iso

Most testing got done using the latest broken ISO, which is 20140827, and it has:
systemd-215-11.fc21.x86_64
spice-vdagent-0.15.0-3.fc21.x86_64


How reproducible:
always (for me), if you use the right iso (e.g. 20140827)


Steps to Reproduce:
1. download the ISO I claim to be always broken, e.g. Fedora-Live-Workstation-x86_64-21-20140827.iso. That will increase your chances to see this problem (it's a potential race condition, but I see it consistently per-image).
2. boot the iso the usual way, confirm that spice-vdagentd.service failed to start
$ systemctl status spice-vdagentd.service
3. try to start the service manually, it should work
$ sudo systemctl start spice-vdagentd.service
$ systemctl status spice-vdagentd.service
4. repeat 1-3 several times, it should always work the same

5. boot the iso without rd.live.image on the command line
6. confirm that spice-vdagentd.service is running successfully right on boot
7. repeat 5-6 several times, it should always work the same

8. boot the iso with "rd.break=cleanup systemd.log_level=debug" on the command line
9. in the dracut shell, do this
$ mkdir /mnt
$ mount /dev/root /mnt
$ vi /mnt/etc/init.d/livesys
# put a debug printout before and after the two tmpfs mount commands, e.g.:
> echo 'XXXXX livesys: starting tmpfs mount'
> mount -t tmpfs -o mode=0755 varcacheyum /var/cache/yum
> mount -t tmpfs vartmp /var/tmp
> echo 'XXXXX livesys: finished tmpfs mount'
$ umount /mnt
$ logout
$ logout
10. in the booted system, run 'journalctl -a'
11. if spice-vdagentd.service failed to start, you should see the two debug printouts *before* the spice command gets executed, e.g. first this:
> Aug 29 05:19:00 localhost livesys[816]: XXXXX livesys: starting tmpfs mount
> Aug 29 05:19:00 localhost kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> Aug 29 05:19:00 localhost kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> Aug 29 05:19:00 localhost livesys[816]: XXXXX livesys: finished tmpfs mount
and only afterwards this:
> Aug 29 05:19:00 localhost systemd[1]: Forked /usr/sbin/spice-vdagentd as 867
> Aug 29 05:19:00 localhost systemd[1]: spice-vdagentd.service changed start-pre -> start
> Aug 29 05:19:00 localhost systemd[867]: Failed at step NAMESPACE spawning /usr/sbin/spice-vdagentd: Operation not permitted
12 if spice-vdagentd.service started correctly, you should see first the spice command executed *before* the two debug printouts, e.g. first this:
> Aug 29 05:14:47 localhost systemd[1]: spice-vdagentd.service changed start -> running
> Aug 29 05:14:47 localhost systemd[1]: Job spice-vdagentd.service/start finished, result=done
> Aug 29 05:14:47 localhost systemd[1]: Started Agent daemon for Spice guests.
and only afterwards this:
> Aug 29 05:14:47 localhost livesys[848]: XXXXX livesys: starting tmpfs mount
> Aug 29 05:14:47 localhost kernel: SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> Aug 29 05:14:47 localhost systemd[1]: var-cache-yum.mount changed dead -> mounted
> Aug 29 05:14:47 localhost systemd[1]: var-tmp.mount changed dead -> mounted
> Aug 29 05:14:47 localhost livesys[848]: XXXXX livesys: finished tmpfs mount
12. repeat 8-11 several times, it should always work the same

13. try a different image

Comment 1 Kamil Páral 2014-08-29 12:23:38 UTC
Created attachment 932660 [details]
systemd debug journal from F21 Nightly 2014-08-26 - OK

This is an example of a boot where spice-vdagentd.service starts correctly. You can see the ordering of the service and the debugging printout as described in reproducer step 11.

Comment 2 Kamil Páral 2014-08-29 12:24:40 UTC
Created attachment 932661 [details]
systemd debug journal from F21 Nightly 2014-08-27 - FAIL

This is an example of a boot where spice-vdagentd.service doesn't start correctly. You can see the ordering of the service and the debugging printout as described in reproducer step 11-12.

Comment 3 Michal Schmidt 2014-08-29 12:56:40 UTC
(In reply to Kamil Páral from comment #0)
> # make it so that we don't do writing to the overlay for things which
> # are just tmpdirs/caches
> mount -t tmpfs -o mode=0755 varcacheyum /var/cache/yum
> mount -t tmpfs vartmp /var/tmp

The mounts could be moved to separate *.mount units, similar to the existing tmp.mount. They could use "ConditionKernelCommandLine=rd.live.image". They would be ordered before local-fs.target, so they wouldn't race with the startup of services.

Comment 4 Kamil Páral 2014-08-29 13:02:18 UTC
Just a note for those unfamiliar with fedora releng scripts - you can see the livesys script contents here:
https://git.fedorahosted.org/cgit/spin-kickstarts.git/tree/fedora-live-base.ks

Comment 5 Zbigniew Jędrzejewski-Szmek 2014-08-29 19:16:09 UTC
I can reproduce the bug with Fedora-Live-Workstation-x86_64-21-20140827.iso. Hm, with the recent changes in systemd, this looks quite bad, because 'systemctl status' says the systems is 'degraded', verious serious situation :). 

I imagine that the race is the following:

- systemd creates /var/tmp/systemd-spice-vdagentd.service-XXXXXX/tmp (or var_tmp),
- livesys script mounts tmpfs on /var/tmp
- systemd tries to use /var/tmp/systemd-spice-vdagentd.service-XXXXXX/tmp and fails

The time window is certainly small, but apparently you can hit it. This would also explain why you see a problem with just one service: it is just the one which coincides with the livesys script.

The solution proposed my Michal in comment #3 is the right one.

--

Alternatively, since /etc/fstab is not copied from the live image to the installed system,
those file entries might be added in fstab directly. This will be less work, I imagine.

BTW, /etc/fstab could be pruned a bit.

$ cat /etc/fstab 
/dev/root  /         ext4    defaults,noatime 0 0
devpts     /dev/pts  devpts  gid=5,mode=620   0 0  <--- not needed
tmpfs      /dev/shm  tmpfs   defaults         0 0  <--- not needed
proc       /proc     proc    defaults         0 0  <--- not needed
sysfs      /sys      sysfs   defaults         0 0  <--- not needed

Comment 6 Kamil Páral 2014-09-02 08:59:55 UTC
Thanks Michal and Zbigniew for your help. I tried to apply the following patch to the kickstart:

==============================
diff --git a/fedora-live-base.ks b/fedora-live-base.ks
index 9929fb7..335b4a2 100644
--- a/fedora-live-base.ks
+++ b/fedora-live-base.ks
@@ -156,12 +156,6 @@ if ! strstr "\`cat /proc/cmdline\`" nopersistenthome && [ -n "\$homedev" ] ; the
   action "Mounting persistent /home" mountPersistentHome
 fi
 
-# make it so that we don't do writing to the overlay for things which
-# are just tmpdirs/caches
-mount -t tmpfs -o mode=0755 varcacheyum /var/cache/yum
-mount -t tmpfs vartmp /var/tmp
-[ -x /sbin/restorecon ] && /sbin/restorecon /var/cache/yum /var/tmp >/dev/null 2>&1
-
 if [ -n "\$configdone" ]; then
   exit 0
 fi
@@ -296,6 +290,11 @@ rm -f /core*
 # convince readahead not to collect
 # FIXME: for systemd
 
+cat >> /etc/fstab << EOF
+vartmp   /var/tmp    tmpfs   defaults   0  0
+varcacheyum /var/cache/yum  tmpfs   mode=0755   0   0
+EOF
+
 %end
==============================

and I built a LiveCD. Everything seems to work correctly. Reassigning to spin-kickstarts, so they can pick up the patch (make it saner, I didn't add any comments or anything, maybe move the code around, it was just a quick hack from me) or use a different approach.

Note:
When we're adjusting this, it could be a good idea to use the same approach for dnf cache directory:
> varcachednf /var/cache/dnf  tmpfs   mode=0755   0   0

Note2:
> BTW, /etc/fstab could be pruned a bit.
I think those items are created automatically by livecd-creator. I haven't found it defined in the kickstart anywhere.

Comment 8 Kamil Páral 2014-09-03 12:19:01 UTC
Thanks, Adam.


Note You need to log in before you can comment on or make changes to this bug.