Bug 1508495 - Services openvswitch and ovsdb-server fail to start
Summary: Services openvswitch and ovsdb-server fail to start
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 28
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-01 14:38 UTC by Jan Pazdziora
Modified: 2018-03-23 12:48 UTC (History)
17 users (show)

Fixed In Version: systemd-238-1.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-23 08:34:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jan Pazdziora 2017-11-01 14:38:17 UTC
Description of problem:

Running systemctl start openvswitch or systemctl start ovsdb-server fail.

Version-Release number of selected component (if applicable):

openvswitch-2.8.1-1.fc28.x86_64

How reproducible:

Deterministic.

Steps to Reproduce:
1. systemctl start openvswitch
2. systemctl status openvswitch
3. systemctl status ovsdb-server
4. journalctl _COMM=ovsdb-server | cat

Actual results:

A dependency job for openvswitch.service failed. See 'journalctl -xe' for details.

● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Nov 01 10:24:38 machine.example.com systemd[1]: Dependency failed for Open vSwitch.
Nov 01 10:24:38 machine.example.com systemd[1]: openvswitch.service: Job openvswitch.service/start failed with result 'dependency'.

● ovsdb-server.service - Open vSwitch Database Unit
   Loaded: loaded (/usr/lib/systemd/system/ovsdb-server.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-11-01 14:33:11 EDT; 2min 42s ago
  Process: 5755 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd --no-monitor --system-id=random --ovs-user=${OVS_USER_ID} start $OPTIO
  Process: 5754 ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch (code=exited, status=0/SUCCESS)

Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Control process exited, code=exited status=1
Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Failed with result 'exit-code'.
Nov 01 14:33:11 machine.example.com systemd[1]: Failed to start Open vSwitch Database Unit.
Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Service hold-off time over, scheduling restart.
Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Scheduled restart job, restart counter is at 5.
Nov 01 14:33:11 machine.example.com systemd[1]: Stopped Open vSwitch Database Unit.
Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Start request repeated too quickly.
Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Failed with result 'exit-code'.
Nov 01 14:33:11 machine.example.com systemd[1]: Failed to start Open vSwitch Database Unit.

Nov 01 14:33:09 machine.example.com ovsdb-server[5632]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
Nov 01 14:33:09 machine.example.com ovsdb-server[5672]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
Nov 01 14:33:10 machine.example.com ovsdb-server[5712]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
Nov 01 14:33:10 machine.example.com ovsdb-server[5752]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
Nov 01 14:33:11 machine.example.com ovsdb-server[5792]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)

Expected results:

No errors.

Additional info:

Comment 2 Jan Pazdziora 2017-11-27 08:19:07 UTC
This issue blocks testing of OpenShift Origin on Fedora rawhide. Any chance of getting the issue resolved?

Comment 3 Flavio Leitner 2017-11-27 12:36:34 UTC
Could you please check if disabling SELinux helps?
Thanks
fbl

Comment 4 Jan Pazdziora 2017-11-27 14:15:15 UTC
Running in permissive makes no difference.

Comment 6 Timothy Redaelli 2017-11-27 15:36:14 UTC
It seems that with new systemd (235) you can't use "chown" in ExecPreStart on a directory created by RuntimeDirectory.

    [root@graphite ~]# cat /etc/systemd/system/test.service
    [Service]
    Type=forking
    ExecStartPre=/usr/bin/chown nobody /var/run/test
    ExecStart=/usr/bin/ls -ld /var/run/test
    RuntimeDirectory=test
    RuntimeDirectoryMode=0755
     
    [root@f27 ~]# systemctl start test.service
    [root@f27 ~]# journalctl -u test.service
    Nov 27 16:31:43 f27 systemd[1]: Starting test.service...
    Nov 27 16:31:43 f27 ls[32545]: drwxr-xr-x. 2 nobody root 40 Nov 27 16:31 /var/run/test
    Nov 27 16:31:43 f27 systemd[1]: Started test.service.
     
    [root@rawhide ~]# systemctl start test.service
    [root@rawhide ~]# journalctl -u test.service
    Nov 27 16:29:33 rawhide systemd[1]: Starting test.service...
    Nov 27 16:29:33 rawhide ls[32545]: drwxr-xr-x. 2 root root 40 Nov 27 16:29 /var/run/test
    Nov 27 16:29:33 rawhide systemd[1]: Started test.service.
    [root@rawhide ~]#

Comment 8 Jan Pazdziora 2017-11-27 15:55:34 UTC
Couldn't ovsdb-server.service use User= and Group= ?

Comment 9 Aaron Conole 2017-11-27 16:30:37 UTC
my understanding is that in order to preserve our required networking permissions and capabilities we can't have systemd downgrade us.  perhaps I am mistaken?

Comment 11 Phil Cameron 2018-02-09 13:59:24 UTC
systemd-237-1.fc28.x86_64 is incompatible with openvswitch-2.8.1-1.fc28.x86_64
 which results in ovsdb-server to fail to come up.

/usr/lib/systemd/system/ovsdb-server.service
ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch
doesn't work. The chown fails.

Note: systemd-234-8.fc27.x86_64 and openvswitch-2.8.1-1.fc27.x86_64 do work properly together.


This causes openshift to not have networking.

This bug blocks openshift from running. Please fix it.

Comment 12 Dusty Mabe 2018-02-12 16:42:32 UTC
Is the expectation that systemd change to be compatibile with the openvswitch systemd unit or is the fix to be done in openvswitch?

If the fix is to be applied in systemd then we should probably move the component to systemd?

Comment 13 Aaron Conole 2018-02-12 16:57:25 UTC
Since the openvswitch service files haven't changed, I would call this a systemd regression, and I will reassign to systemd.

Comment 14 Dusty Mabe 2018-02-12 18:10:58 UTC
(In reply to Aaron Conole from comment #13)
> Since the openvswitch service files haven't changed, I would call this a
> systemd regression, and I will reassign to systemd.

The real question is: was this change inadvertent or by design? I guess the systemd team can answer that.

Comment 15 Zbigniew Jędrzejewski-Szmek 2018-02-14 10:29:15 UTC
Behaviour changed in https://github.com/systemd/systemd/commit/3536f49e8f. I'm looking into this.

Comment 16 Zbigniew Jędrzejewski-Szmek 2018-02-14 12:09:09 UTC
This is a grey area. Nothing in the documentation says that either the old or the new behaviours are guaranteed. Old behaviour was an outcome of the implementation, basically when the directory existed, we'd exit, so the chmod/chown steps were implicitly skipped. Later this was (consciously) changed to always do a recursive chown. When those directories are used by dynamic users, a full chown is necessary, since the uid can change between invocations. At the same time, the change was also done for the "classic" case of normal users. I opened a PR upstream (https://github.com/systemd/systemd/pull/8181), to partially revert the old behaviour. But it's something that requires discussion, so I'm not sure if that PR will be accepted, and if it is, how much it will change before that.

(In reply to Aaron Conole from comment #9)
> my understanding is that in order to preserve our required networking
> permissions and capabilities we can't have systemd downgrade us.  perhaps I
> am mistaken?

I don't know anything about openvswitch, so there might be some complications, but in general, it is possible to run a service under a non-root user with capabilities to modify the network configuration. For example, systemd-networkd has:

[Service]
User=systemd-network
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW

I expect something similar should work here.

Comment 17 Fedora End Of Life 2018-02-20 15:29:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 18 Zbigniew Jędrzejewski-Szmek 2018-03-23 08:34:01 UTC
This was fixed upstream in https://github.com/systemd/systemd/commit/30c81ce2ce which is in systemd-238.

Comment 19 Jan Pazdziora 2018-03-23 11:16:13 UTC
With

systemd-238-5.fc29.x86_64
openvswitch-2.9.0-3.fc29.x86_64

on Fedora rawhide, the start of openvswitch still fails with

systemd[1]: Starting Open vSwitch Database Unit...
audit[11091]: CRED_ACQ pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
runuser[11091]: pam_unix(runuser:session): session opened for user openvswitch by (uid=0)
audit[11091]: USER_START pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
runuser[11091]: pam_unix(runuser:session): session closed for user openvswitch
audit[11091]: USER_END pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
audit[11091]: CRED_DISP pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
ovs-ctl[11038]: /etc/openvswitch/conf.db does not exist ... (warning).
audit[11102]: CRED_ACQ pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
runuser[11102]: pam_unix(runuser:session): session opened for user openvswitch by (uid=0)
audit[11102]: USER_START pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
runuser[11102]: pam_unix(runuser:session): session closed for user openvswitch
audit[11102]: USER_END pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
audit[11102]: CRED_DISP pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
ovs-ctl[11038]: Creating empty database /etc/openvswitch/conf.db [  OK  ]
ovs-ctl[11038]: Starting ovsdb-server [  OK  ]
ovs-vsctl[11122]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.15.1
ovs-vsctl[11127]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.9.0 "external-ids:system-id=\"c4c943ff-26bc-4488-85e5-438b73a36650\"" "external-ids:rundir=\"/var/run/openvswitch\"" "system-type=\"fedora\"" "system-version=\"29\""
ovs-ctl[11038]: Configuring Open vSwitch system IDs [  OK  ]
systemd[1]: Started Open vSwitch Database Unit.
audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ovsdb-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
systemd[1]: Starting Open vSwitch Delete Transient Ports...
systemd[1]: Started Open vSwitch Delete Transient Ports.
audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ovs-delete-transient-ports comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
systemd[1]: Starting Open vSwitch Forwarding Unit...
audit[11203]: AVC avc:  denied  { map } for  pid=11203 comm="modprobe" path="/usr/lib/modules/4.16.0-0.rc6.git2.1.fc29.x86_64/modules.dep.bin" dev="dm-0" ino=8742974 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:modules_object_t:s0 tclass=file permissive=0
ovs-ctl[11151]: Inserting openvswitch module modprobe: ERROR: mmap(NULL, 509470, PROT_READ, 3, MAP_PRIVATE, 0): Permission denied
audit[11209]: AVC avc:  denied  { map } for  pid=11209 comm="modprobe" path="/usr/lib/modules/4.16.0-0.rc6.git2.1.fc29.x86_64/modules.dep.bin" dev="dm-0" ino=8742974 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:modules_object_t:s0 tclass=file permissive=0
ovs-ctl[11151]: modprobe: ERROR: mmap(NULL, 509470, PROT_READ, 3, MAP_PRIVATE, 0): Permission denied
audit[11209]: AVC avc:  denied  { module_load } for  pid=11209 comm="modprobe" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=system permissive=0
ovs-ctl[11151]: modprobe: ERROR: could not insert 'nf_conntrack': Permission denied
ovs-ctl[11151]: modprobe: ERROR: Error running install command for nf_conntrack
ovs-ctl[11151]: modprobe: ERROR: could not insert 'openvswitch': Operation not permitted
ovs-ctl[11151]: [FAILED]
ovs-vsctl[11210]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=test.example.com
systemd[1]: ovs-vswitchd.service: Control process exited, code=exited status=1
systemd[1]: ovs-vswitchd.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Open vSwitch Forwarding Unit.
audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ovs-vswitchd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
systemd[1]: Dependency failed for Open vSwitch.
systemd[1]: openvswitch.service: Job openvswitch.service/start failed with result 'dependency'.

Would you like a separate bugzilla for that, or should we reopen and reassign this one back to the original component? While there are AVC denials logged, it now might or might not be the same as bug 1508336 and bug 1508337 as in those cases, the service actually seem to start. And possibly the AVC denials do not cause the servcie failure at all -- that would likely need to be investigated.

Comment 20 Zbigniew Jędrzejewski-Szmek 2018-03-23 11:47:42 UTC
I think this is different bug:
> ovs-ctl[11151]: modprobe: ERROR: could not insert 'openvswitch': Operation not permitted
> ovs-ctl[11151]: [FAILED]
I don't know anything about openvswitch, but it seems that this error is fatal.

I guess it's better to open a new bug against the selinux policy with just the last comment. This one has a lot of history that is not relevant anymore.

Comment 21 Aaron Conole 2018-03-23 12:48:30 UTC
Shouldn't need to open a new bug for the selinux violation.  It is fatal, but we are pushing a fix upstream for it and have 3 bugs open for it already.


Note You need to log in before you can comment on or make changes to this bug.