Bug 1112908 - systemd-214 starts disabled sysv initscripts that depend on $network
Summary: systemd-214 starts disabled sysv initscripts that depend on $network
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-25 03:48 UTC by Bruno Wolff III
Modified: 2014-06-26 14:56 UTC (History)
11 users (show)

Fixed In Version: systemd-214-4.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-26 06:38:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 80537 0 None None None Never

Description Bruno Wolff III 2014-06-25 03:48:12 UTC
Description of problem:
After a reboot plague attempts to start even though it has been disabled in systemd. (It ends up failing, but it should try to start at all.)
I did notice it has both a systemd service and an init.d file, which may or may not be related to the problem, but seems unusual.

Version-Release number of selected component (if applicable):
plague-0.4.5.8-18.fc21.noarch

How reproducible:
Every boot

Steps to Reproduce:
1.systemctl disable plague-server
2.reboot
3.systemctl

Actual results:
systemctl shows a failed attempt to start plague-server

Expected results:
plague-server should not try to start

Comment 1 Adam Williamson 2014-06-25 06:35:58 UTC
Can you provide more detailed output - i.e. the actual output of 'systemctl status plague-server.service' , would be a start? Is there a generated plague-server.service in /run/systemd ?

Comment 2 Bruno Wolff III 2014-06-25 06:44:41 UTC
systemctl -l status plague-server.service
● plague-server.service - Plague server daemon for build-system master machines
   Loaded: loaded (/usr/lib/systemd/system/plague-server.service; disabled)
   Active: failed (Result: exit-code) since Tue 2014-06-24 22:31:53 CDT; 3h 10min ago
  Process: 1077 ExecStart=/usr/bin/plague-server -d -c ${CONFIG} -p ${PIDFILE} $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1529 (code=exited, status=3)

Jun 24 22:31:52 bruno.wolff.to systemd[1]: Started Plague server daemon for build-system master machines.
Jun 24 22:31:53 bruno.wolff.to systemd[1]: plague-server.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jun 24 22:31:53 bruno.wolff.to systemd[1]: Unit plague-server.service entered failed state.

find /run/systemd -name '*plague*'
/run/systemd/generator.late/plague-server.service
/run/systemd/generator.late/network-online.target.wants/plague-server.service

That looks wrong, since there is a native plague service there shouldn't be a generated once for the init.d file. (This might make it a systemd bug.)

Comment 3 Adam Williamson 2014-06-25 18:33:08 UTC
just to confirm, can you post output of:

ls -l /etc/rc*.d/S*

? thanks.

Comment 4 Bruno Wolff III 2014-06-25 18:45:29 UTC
ls -l /etc/rc*.d/S*
lrwxrwxrwx. 1 root root 17 Jun 10 11:19 /etc/rc2.d/S10network -> ../init.d/network
lrwxrwxrwx. 1 root root 15 Jun 13 11:27 /etc/rc2.d/S11dahdi -> ../init.d/dahdi
lrwxrwxrwx. 1 root root 17 Jun 10 11:19 /etc/rc3.d/S10network -> ../init.d/network
lrwxrwxrwx. 1 root root 15 Jun 13 11:27 /etc/rc3.d/S11dahdi -> ../init.d/dahdi
lrwxrwxrwx. 1 root root 17 Jun 10 11:19 /etc/rc4.d/S10network -> ../init.d/network
lrwxrwxrwx. 1 root root 15 Jun 13 11:27 /etc/rc4.d/S11dahdi -> ../init.d/dahdi
lrwxrwxrwx. 1 root root 17 Jun 10 11:19 /etc/rc5.d/S10network -> ../init.d/network
lrwxrwxrwx. 1 root root 15 Jun 13 11:27 /etc/rc5.d/S11dahdi -> ../init.d/dahdi

Comment 5 Adam Williamson 2014-06-25 18:52:07 UTC
I suspect what's going on here is that the sysv generator's logic for handling LSB deps, which has some kind of special casing for the network-online target, is getting things wrong and adding sysv-converted services to the network-online.target.wants when it should not be. But I'm too dumb to grok the code, so I'm going to just poke at it in a VM with trial and error. The code in question is around:

http://cgit.freedesktop.org/systemd/systemd/tree/src/sysv-generator/sysv-generator.c#n485

Comment 6 Adam Williamson 2014-06-25 18:53:03 UTC
btw, there's arguably two different bugs here:

1) plague ships both sysv and systemd services with the same name, that seems wrong
2) systemd may be erroneously enabling sysv services

Comment 7 Adam Williamson 2014-06-25 19:13:11 UTC
OK, so I installed a completely clean Rawhide VM, and did 'yum install plague'.

I rebooted, and there was a /run/systemd/generator.late/network-online.target.wants/plague-server.service .

I then edited /etc/init.d/plague-server . As stock, it has this at the top:

# Required-Start: $local_fs $named $network $time
# Required-Stop: $local_fs $named $network $time

I edited both lines to remove $network:

# Required-Start: $local_fs $named $time
# Required-Stop: $local_fs $named $time

and rebooted. Now, there is no /run/systemd/generator.late/network-online.target.wants/plague-server.service .

The LSB spec says:

http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/initscrcomconv.html

"Required-Start: boot_facility_1 [boot_facility_2...]

    facilities which must be available during startup of this service. The init-script system should insure init scripts which provide the Required-Start facilities are started before starting this script."

Also http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/facilname.html :

"$network	 	

basic networking support is available. Example: a server program could listen on a socket."

So as I understand it, what that basically means is that if /etc/init.d/foo has:

# Required-Start: $network

that would be equivalent to /lib/systemd/system/foo.service having

Wants: network-online.target

However, systemd-sysv-generator seems to be doing almost the opposite. When it sees Required-Start: $network, it translates it into /run/systemd/generator.late/network-online.target.wants/foo.service - which is not "foo requires network-online", but "network-online requires foo".

Looking at the code linked in c#5 I can't quite see how this is going wrong, but it sure seems to be.

I note that the generated plague-server.service has:

After=nss-lookup.target network-online.target time-sync.target

but no:

Wants=network-online.target

which I *think* is what the code tries to achieve. It really does look like, somehow, the wants relationship gets inverted.

Comment 8 Adam Williamson 2014-06-25 19:19:26 UTC
Oh, I see sysv-generator.c actually handles the implementation of "wants" itself (it's not using some commonly-used systemd function for that, or anything), so the error could be there too - around line 185:

        STRV_FOREACH(p, s->wants) {
                r = add_symlink(s->name, *p);
                if (r < 0)
                        log_error_unit(s->name, "Failed to create 'Wants' symlink to %s: %s", *p, strerror(-r));
        }

thrill! as the monkey brain desperately tries to figure out what p and s are, and if they're getting crossed up somewhere!

Comment 9 Adam Williamson 2014-06-25 19:42:22 UTC
OK, so I'm about 80% sure this is what's going on:

sysv-generator.c 's "add_symlink" function takes two arguments, the first being "service to create a symlink to" and the second being "where to create the symlink". That function is lines 88-113 in my copy.

sysv-generator.c 's "generate_unit_file" function uses add_symlink when implementing wants relationships, but calls it the wrong way around:

        STRV_FOREACH(p, s->wants) {
                r = add_symlink(s->name, *p);

's->name' is the *depending* service (i.e. the service that Wants another service). '*p' is the *dependent* service (i.e. the service it Wants). But given how add_symlink works, that winds up creating a a symlink to 'depending.service' in 'dependent.service.wants/' - which is exactly the wrong way around.

I think generate_unit_file should do this:

        STRV_FOREACH(p, s->wants) {
                r = add_symlink(*p, s->name);

I'm gonna test that out.

Comment 10 Adam Williamson 2014-06-25 21:21:26 UTC
So, that's not exactly wrong - I'm pretty sure it really is what's going wrong in this particular case - but it doesn't consider a couple of different cases where the same code has the effect that was actually intended in that case, and a subsequent problem appears even in this case if you just reverse the parameters.

At this point I'm pushing this one upstream: I've filed https://bugs.freedesktop.org/show_bug.cgi?id=80537 with all the details I've managed to figure out. I've also poked systemd-devel@ , since I *think* this is a fairly serious bug that could lead to rather unfortunate consequences if you're unlucky enough to have, say, a /etc/init.d/telnet or /etc/init.d/hilariously_insecure_server of some other kind lying around that is supposed to be disabled.

Comment 11 Adam Williamson 2014-06-25 23:01:55 UTC
Upstream has a fix for the systemd side of this now, I'm doing a scratch build to test the fix:

http://koji.fedoraproject.org/koji/taskinfo?taskID=7076866

Comment 12 Bruno Wolff III 2014-06-26 00:21:51 UTC
It looks good. Several services that had been attempting to start when they shouldn't have been are no longer doing so.
I rebuilt the initramfs to make sure I tested the update in the early boot as well as after pivot.

Comment 13 Adam Williamson 2014-06-26 06:17:12 UTC
cool, that was my experience too. I've mailed the systemd maintainers to ask them to handle backporting the fix properly, I hope they'll do it tomorrow.

Can you open a separate bug for plague shipping both systemd and sysv services? That is still a bug in its own right, I believe, though as it turns out, it was not a factor in the bug causing the service to be started.

Comment 14 Bruno Wolff III 2014-06-26 14:56:04 UTC
I opened Bug 1113644 for the superfluous init script.


Note You need to log in before you can comment on or make changes to this bug.