Description of problem: Running systemctl start openvswitch or systemctl start ovsdb-server fail. Version-Release number of selected component (if applicable): openvswitch-2.8.1-1.fc28.x86_64 How reproducible: Deterministic. Steps to Reproduce: 1. systemctl start openvswitch 2. systemctl status openvswitch 3. systemctl status ovsdb-server 4. journalctl _COMM=ovsdb-server | cat Actual results: A dependency job for openvswitch.service failed. See 'journalctl -xe' for details. ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled) Active: inactive (dead) Nov 01 10:24:38 machine.example.com systemd[1]: Dependency failed for Open vSwitch. Nov 01 10:24:38 machine.example.com systemd[1]: openvswitch.service: Job openvswitch.service/start failed with result 'dependency'. ● ovsdb-server.service - Open vSwitch Database Unit Loaded: loaded (/usr/lib/systemd/system/ovsdb-server.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2017-11-01 14:33:11 EDT; 2min 42s ago Process: 5755 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd --no-monitor --system-id=random --ovs-user=${OVS_USER_ID} start $OPTIO Process: 5754 ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch (code=exited, status=0/SUCCESS) Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Control process exited, code=exited status=1 Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Failed with result 'exit-code'. Nov 01 14:33:11 machine.example.com systemd[1]: Failed to start Open vSwitch Database Unit. Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Service hold-off time over, scheduling restart. Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Scheduled restart job, restart counter is at 5. Nov 01 14:33:11 machine.example.com systemd[1]: Stopped Open vSwitch Database Unit. Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Start request repeated too quickly. Nov 01 14:33:11 machine.example.com systemd[1]: ovsdb-server.service: Failed with result 'exit-code'. Nov 01 14:33:11 machine.example.com systemd[1]: Failed to start Open vSwitch Database Unit. Nov 01 14:33:09 machine.example.com ovsdb-server[5632]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Nov 01 14:33:09 machine.example.com ovsdb-server[5672]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Nov 01 14:33:10 machine.example.com ovsdb-server[5712]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Nov 01 14:33:10 machine.example.com ovsdb-server[5752]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Nov 01 14:33:11 machine.example.com ovsdb-server[5792]: ovs|00002|daemon_unix|EMER|/var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied) Expected results: No errors. Additional info:
This issue blocks testing of OpenShift Origin on Fedora rawhide. Any chance of getting the issue resolved?
Could you please check if disabling SELinux helps? Thanks fbl
Running in permissive makes no difference.
It seems that with new systemd (235) you can't use "chown" in ExecPreStart on a directory created by RuntimeDirectory. [root@graphite ~]# cat /etc/systemd/system/test.service [Service] Type=forking ExecStartPre=/usr/bin/chown nobody /var/run/test ExecStart=/usr/bin/ls -ld /var/run/test RuntimeDirectory=test RuntimeDirectoryMode=0755 [root@f27 ~]# systemctl start test.service [root@f27 ~]# journalctl -u test.service Nov 27 16:31:43 f27 systemd[1]: Starting test.service... Nov 27 16:31:43 f27 ls[32545]: drwxr-xr-x. 2 nobody root 40 Nov 27 16:31 /var/run/test Nov 27 16:31:43 f27 systemd[1]: Started test.service. [root@rawhide ~]# systemctl start test.service [root@rawhide ~]# journalctl -u test.service Nov 27 16:29:33 rawhide systemd[1]: Starting test.service... Nov 27 16:29:33 rawhide ls[32545]: drwxr-xr-x. 2 root root 40 Nov 27 16:29 /var/run/test Nov 27 16:29:33 rawhide systemd[1]: Started test.service. [root@rawhide ~]#
Couldn't ovsdb-server.service use User= and Group= ?
my understanding is that in order to preserve our required networking permissions and capabilities we can't have systemd downgrade us. perhaps I am mistaken?
systemd-237-1.fc28.x86_64 is incompatible with openvswitch-2.8.1-1.fc28.x86_64 which results in ovsdb-server to fail to come up. /usr/lib/systemd/system/ovsdb-server.service ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch doesn't work. The chown fails. Note: systemd-234-8.fc27.x86_64 and openvswitch-2.8.1-1.fc27.x86_64 do work properly together. This causes openshift to not have networking. This bug blocks openshift from running. Please fix it.
Is the expectation that systemd change to be compatibile with the openvswitch systemd unit or is the fix to be done in openvswitch? If the fix is to be applied in systemd then we should probably move the component to systemd?
Since the openvswitch service files haven't changed, I would call this a systemd regression, and I will reassign to systemd.
(In reply to Aaron Conole from comment #13) > Since the openvswitch service files haven't changed, I would call this a > systemd regression, and I will reassign to systemd. The real question is: was this change inadvertent or by design? I guess the systemd team can answer that.
Behaviour changed in https://github.com/systemd/systemd/commit/3536f49e8f. I'm looking into this.
This is a grey area. Nothing in the documentation says that either the old or the new behaviours are guaranteed. Old behaviour was an outcome of the implementation, basically when the directory existed, we'd exit, so the chmod/chown steps were implicitly skipped. Later this was (consciously) changed to always do a recursive chown. When those directories are used by dynamic users, a full chown is necessary, since the uid can change between invocations. At the same time, the change was also done for the "classic" case of normal users. I opened a PR upstream (https://github.com/systemd/systemd/pull/8181), to partially revert the old behaviour. But it's something that requires discussion, so I'm not sure if that PR will be accepted, and if it is, how much it will change before that. (In reply to Aaron Conole from comment #9) > my understanding is that in order to preserve our required networking > permissions and capabilities we can't have systemd downgrade us. perhaps I > am mistaken? I don't know anything about openvswitch, so there might be some complications, but in general, it is possible to run a service under a non-root user with capabilities to modify the network configuration. For example, systemd-networkd has: [Service] User=systemd-network CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW AmbientCapabilities=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW I expect something similar should work here.
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle. Changing version to '28'.
This was fixed upstream in https://github.com/systemd/systemd/commit/30c81ce2ce which is in systemd-238.
With systemd-238-5.fc29.x86_64 openvswitch-2.9.0-3.fc29.x86_64 on Fedora rawhide, the start of openvswitch still fails with systemd[1]: Starting Open vSwitch Database Unit... audit[11091]: CRED_ACQ pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' runuser[11091]: pam_unix(runuser:session): session opened for user openvswitch by (uid=0) audit[11091]: USER_START pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' runuser[11091]: pam_unix(runuser:session): session closed for user openvswitch audit[11091]: USER_END pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' audit[11091]: CRED_DISP pid=11091 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' ovs-ctl[11038]: /etc/openvswitch/conf.db does not exist ... (warning). audit[11102]: CRED_ACQ pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' runuser[11102]: pam_unix(runuser:session): session opened for user openvswitch by (uid=0) audit[11102]: USER_START pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' runuser[11102]: pam_unix(runuser:session): session closed for user openvswitch audit[11102]: USER_END pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_unix acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' audit[11102]: CRED_DISP pid=11102 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:openvswitch_t:s0 msg='op=PAM:setcred grantors=pam_rootok acct="openvswitch" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success' ovs-ctl[11038]: Creating empty database /etc/openvswitch/conf.db [ OK ] ovs-ctl[11038]: Starting ovsdb-server [ OK ] ovs-vsctl[11122]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.15.1 ovs-vsctl[11127]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.9.0 "external-ids:system-id=\"c4c943ff-26bc-4488-85e5-438b73a36650\"" "external-ids:rundir=\"/var/run/openvswitch\"" "system-type=\"fedora\"" "system-version=\"29\"" ovs-ctl[11038]: Configuring Open vSwitch system IDs [ OK ] systemd[1]: Started Open vSwitch Database Unit. audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ovsdb-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' systemd[1]: Starting Open vSwitch Delete Transient Ports... systemd[1]: Started Open vSwitch Delete Transient Ports. audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ovs-delete-transient-ports comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' systemd[1]: Starting Open vSwitch Forwarding Unit... audit[11203]: AVC avc: denied { map } for pid=11203 comm="modprobe" path="/usr/lib/modules/4.16.0-0.rc6.git2.1.fc29.x86_64/modules.dep.bin" dev="dm-0" ino=8742974 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:modules_object_t:s0 tclass=file permissive=0 ovs-ctl[11151]: Inserting openvswitch module modprobe: ERROR: mmap(NULL, 509470, PROT_READ, 3, MAP_PRIVATE, 0): Permission denied audit[11209]: AVC avc: denied { map } for pid=11209 comm="modprobe" path="/usr/lib/modules/4.16.0-0.rc6.git2.1.fc29.x86_64/modules.dep.bin" dev="dm-0" ino=8742974 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:modules_object_t:s0 tclass=file permissive=0 ovs-ctl[11151]: modprobe: ERROR: mmap(NULL, 509470, PROT_READ, 3, MAP_PRIVATE, 0): Permission denied audit[11209]: AVC avc: denied { module_load } for pid=11209 comm="modprobe" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=system permissive=0 ovs-ctl[11151]: modprobe: ERROR: could not insert 'nf_conntrack': Permission denied ovs-ctl[11151]: modprobe: ERROR: Error running install command for nf_conntrack ovs-ctl[11151]: modprobe: ERROR: could not insert 'openvswitch': Operation not permitted ovs-ctl[11151]: [FAILED] ovs-vsctl[11210]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=test.example.com systemd[1]: ovs-vswitchd.service: Control process exited, code=exited status=1 systemd[1]: ovs-vswitchd.service: Failed with result 'exit-code'. systemd[1]: Failed to start Open vSwitch Forwarding Unit. audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ovs-vswitchd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' systemd[1]: Dependency failed for Open vSwitch. systemd[1]: openvswitch.service: Job openvswitch.service/start failed with result 'dependency'. Would you like a separate bugzilla for that, or should we reopen and reassign this one back to the original component? While there are AVC denials logged, it now might or might not be the same as bug 1508336 and bug 1508337 as in those cases, the service actually seem to start. And possibly the AVC denials do not cause the servcie failure at all -- that would likely need to be investigated.
I think this is different bug: > ovs-ctl[11151]: modprobe: ERROR: could not insert 'openvswitch': Operation not permitted > ovs-ctl[11151]: [FAILED] I don't know anything about openvswitch, but it seems that this error is fatal. I guess it's better to open a new bug against the selinux policy with just the last comment. This one has a lot of history that is not relevant anymore.
Shouldn't need to open a new bug for the selinux violation. It is fatal, but we are pushing a fix upstream for it and have 3 bugs open for it already.