Bug 1412750

Summary: Multiple services in recent Rawhide fail to start with: Failed at step USER spawning (executable): Permission denied
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: selinux-policyAssignee: Lukas Vrabec <lvrabec>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 26CC: awilliam, dominick.grift, dominik.mierzejewski, dwalsh, gmarr, jpokorny, lslebodn, lvrabec, mgrepl, msekleta, normand, plautrba, pmoore, pwhalen, robatino, ssekidde, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: selinux-policy-3.13.1-236.fc26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-02 23:46:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1349184    

Description Paul Whalen 2017-01-12 17:06:15 UTC
Description of problem:
Cockpit fails to start on default rawhide(20170110 compose) server installation. 

Version-Release number of selected component (if applicable):
cockpit-125-1.fc26
selinux-policy-3.13.1-233.fc26

How reproducible:
everytime

Steps to Reproduce:
1. Install server product with defaults
2. Reboot installed system


Actual results:

systemctl status cockpit.service 
● cockpit.service - Cockpit Web Service
   Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor prese
   Active: inactive (dead)
     Docs: man:cockpit-ws(8)

After attempting to restart:

systemctl status cockpit.service
● cockpit.service - Cockpit Web Service
   Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-01-12 11:58:58 EST; 3min 2s ago
     Docs: man:cockpit-ws(8)
  Process: 1117 ExecStartPre=/usr/sbin/remotectl certificate --ensure --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, status=217/USER)

Jan 12 11:58:58 rainier-2-10 systemd[1]: Starting Cockpit Web Service...
Jan 12 11:58:58 rainier-2-10 systemd[1117]: cockpit.service: Failed at step USER spawning /usr/sbin/remotectl: Permission denied
Jan 12 11:58:58 rainier-2-10 systemd[1]: cockpit.service: Control process exited, code=exited status=217
Jan 12 11:58:58 rainier-2-10 systemd[1]: Failed to start Cockpit Web Service.
Jan 12 11:58:58 rainier-2-10 systemd[1]: cockpit.service: Unit entered failed state.
Jan 12 11:58:58 rainier-2-10 systemd[1]: cockpit.service: Failed with result 'exit-code'.

Starts when selinux is disabled. 

setenforce 0
[root@rainier-2-10 ~]# systemctl restart cockpit.service
[root@rainier-2-10 ~]# systemctl status cockpit.service
● cockpit.service - Cockpit Web Service
   Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2017-01-12 12:03:38 EST; 5s ago
     Docs: man:cockpit-ws(8)
  Process: 1177 ExecStartPre=/usr/sbin/remotectl certificate --ensure --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, status=0/SUCCESS)
 Main PID: 1179 (cockpit-ws)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/cockpit.service
           └─1179 /usr/libexec/cockpit-ws

Jan 12 12:03:37 rainier-2-10 systemd[1]: Starting Cockpit Web Service...
Jan 12 12:03:38 rainier-2-10 systemd[1]: Started Cockpit Web Service.
Jan 12 12:03:38 rainier-2-10 cockpit-ws[1179]: Using certificate: /etc/cockpit/ws-certs.d/0-self-signed.cert

Expected results:
Service started on boot.

Additional info:

type=SERVICE_START msg=audit(1484239318.870:200): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(1484239318.883:201): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(1484239318.894:202): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(1484239318.906:203): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(1484239318.918:204): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(1484240013.844:227): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(1484240075.864:233): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=AVC msg=audit(1484240075.918:234): avc:  denied  { read } for  pid=1240 comm="cockpit-ws" name="cpuinfo" dev="proc" ino=4026531910 scontext=system_u:system_r:cockpit_ws_t:s0 tcontext=system_u:object_r:proc_t:s0 tclass=file permissive=1
type=AVC msg=audit(1484240075.918:235): avc:  denied  { open } for  pid=1240 comm="cockpit-ws" path="/proc/cpuinfo" dev="proc" ino=4026531910 scontext=system_u:system_r:cockpit_ws_t:s0 tcontext=system_u:object_r:proc_t:s0 tclass=file permissive=1
type=AVC msg=audit(1484240075.918:236): avc:  denied  { getattr } for  pid=1240 comm="cockpit-ws" path="/proc/cpuinfo" dev="proc" ino=4026531910 scontext=system_u:system_r:cockpit_ws_t:s0 tcontext=system_u:object_r:proc_t:s0 tclass=file permissive=1
type=SERVICE_STOP msg=audit(1484240166.589:237): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=SERVICE_START msg=audit(1484240338.856:189): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=cockpit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

Comment 1 Adam Williamson 2017-01-16 17:41:17 UTC
openQA has been running into this since 20170114.n.1. It worked in 20170107.n.0. There've been two cockpit package builds between those dates.

Nominating as an Alpha blocker, per criterion "Unless explicitly specified otherwise, after system installation the Cockpit web management interface must be running and accessible on its default port (9090)" - https://fedoraproject.org/wiki/Fedora_26_Alpha_Release_Criteria#Cockpit_management_interface .

Comment 2 Adam Williamson 2017-01-17 00:50:40 UTC
In fact, failures to start several other services seem to have the same cause: colord, packagekit, and geoclue. They're all 'Failed at step USER spawning (executable): Permission denied", and they all start with SELinux in permissive mode.

Comment 3 Adam Williamson 2017-01-17 00:51:14 UTC
Adding zbyszek as there may be a systemd component to this too.

Comment 4 Zbigniew Jędrzejewski-Szmek 2017-01-17 05:17:01 UTC
Dunno, #1413047 is about F25, and it seems to be kernel-version dependent. And AVC seem to be different. There are no relevant AVCs here, maybe some dontaudit rules are in effect?

Comment 5 Zbigniew Jędrzejewski-Szmek 2017-01-17 06:02:25 UTC
Yeah, it's definitely not the same thing. In rawhide, it fails in send_user_lookup, on writev() back to the mother ship. This code is not in F25.

(gdb) bt
#0  writev () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000561cfcc53cd2 in send_user_lookup (gid=<optimized out>, uid=<optimized out>, user_lookup_fd=29, unit=0x561cfced7e80) at src/core/execute.c:2210
#2  exec_child (unit=unit@entry=0x561cfced7e80, command=command@entry=0x561cfced9a20, context=context@entry=0x561cfced82a8, params=params@entry=0x7ffeadabc3f0, runtime=runtime@entry=0x0, dcreds=dcreds@entry=0x561cfced85d0, argv=0x561cfcf576d0, socket_fd=-1, named_iofds=0x7ffeadabc2a0, fds=0x0, n_fds=0, files_env=0x0, user_lookup_fd=29, exit_status=0x7ffeadabc250) at src/core/execute.c:2358
#3  0x0000561cfcc56f78 in exec_spawn (unit=0x561cfced7e80, command=0x561cfced9a20, context=0x561cfced82a8, params=0x7ffeadabc3f0, runtime=0x0, dcreds=0x561cfced85d0, ret=0x7ffeadabc3e0) at src/core/execute.c:2893
#4  0x0000561cfcc60e31 in service_spawn.lto_priv.540 (s=s@entry=0x561cfced7e80, c=<optimized out>, timeout=<optimized out>, flags=flags@entry=(EXEC_APPLY_PERMISSIONS | EXEC_APPLY_CHROOT | EXEC_APPLY_TTY_STDIN | EXEC_IS_CONTROL), _pid=_pid@entry=0x561cfced85e4) at src/core/service.c:1351
#5  0x0000561cfcc62567 in service_enter_start_pre (s=0x561cfced7e80) at src/core/service.c:1851
#6  service_start.lto_priv.74 (u=0x561cfced7e80) at src/core/service.c:2076
#7  0x0000561cfcc32eea in unit_start (u=0x561cfced7e80) at src/core/unit.c:1590
#8  job_perform_on_unit.lto_priv.579 (j=0x7ffeadabc590) at src/core/job.c:531
#9  0x0000561cfcc7e698 in job_run_and_invalidate (j=<optimized out>) at src/core/job.c:595
#10 manager_dispatch_run_queue.lto_priv.596 (source=<optimized out>, userdata=<optimized out>, userdata=<optimized out>) at src/core/manager.c:1588
#11 0x00007f92a3905d2d in source_dispatch (s=s@entry=0x561cfce0dc80) at src/libsystemd/sd-event/sd-event.c:2307
#12 0x00007f92a3905fd4 in sd_event_dispatch (e=e@entry=0x561cfce0d9e0) at src/libsystemd/sd-event/sd-event.c:2626
#13 0x00007f92a3907507 in sd_event_run (e=0x561cfce0d9e0, timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2685
#14 0x0000561cfcc0b9a5 in manager_loop (m=<optimized out>) at src/core/manager.c:2251
#15 main (argc=<optimized out>, argv=0x7ffeadabca18) at src/core/main.c:1899

So, the socket it's trying to write to comes from
   socketpair(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0, m->user_lookup_fds)
in manager_setup_user_look_up.
Can we amend the selinux policy to allow writev() on that?

Comment 6 Lukas Slebodnik 2017-01-17 10:47:14 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #4)
> Dunno, #1413047 is about F25, and it seems to be kernel-version dependent.
> And AVC seem to be different. There are no relevant AVCs here, maybe some
> dontaudit rules are in effect?

That is not true. BZ1413047 is not about F25. The systemd is from rawhide and the same problem is with latest rawhidekernel.
.
It is just my workaround to use the older f25 kernel.

BTW there is a slightly related BZ1413075 which I thin in permissive mode.

Comment 7 Lukas Slebodnik 2017-01-17 10:49:09 UTC
(In reply to Lukas Slebodnik from comment #6)
> (In reply to Zbigniew Jędrzejewski-Szmek from comment #4)
> > Dunno, #1413047 is about F25, and it seems to be kernel-version dependent.
> > And AVC seem to be different. There are no relevant AVCs here, maybe some
> > dontaudit rules are in effect?
> 
> That is not true. BZ1413047 is not about F25. The systemd is from rawhide
> and the same problem is with latest rawhidekernel.
> .
> It is just my workaround to use the older f25 kernel.
> 
>BTW there is a slightly related BZ1413075 which I thin in permissive mode.
ehm, too many typos.
BTW there is a slightly related BZ1413075 hit I thin in permissive mode.

Comment 8 Zbigniew Jędrzejewski-Szmek 2017-01-17 20:57:30 UTC
(In reply to Lukas Slebodnik from comment #6)
> That is not true. BZ1413047 is not about F25. The systemd is from rawhide
> and the same problem is with latest rawhidekernel.
You're right, this does seem to be the same problem.

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1413047#c2:

#============= init_t ==============
#!!!! This avc has a dontaudit rule in the current policy
allow init_t kernel_t:unix_stream_socket { read write };
#!!!! This avc has a dontaudit rule in the current policy
allow init_t kernel_t:unix_dgram_socket write;

Comment 9 Zbigniew Jędrzejewski-Szmek 2017-01-17 20:57:50 UTC
*** Bug 1413047 has been marked as a duplicate of this bug. ***

Comment 10 Lukas Slebodnik 2017-01-20 08:33:32 UTC
Michal, Zbigniew is it a duplicate of BZ1392161?

Comment 11 Adam Williamson 2017-01-20 08:37:25 UTC
I'm almost certain it isn't, that report is from long before this problem started happening (see #c1).

Comment 12 Lukas Slebodnik 2017-01-20 09:35:47 UTC
(In reply to Adam Williamson from comment #11)
> I'm almost certain it isn't, that report is from long before this problem
> started happening (see #c1).

You never know; it could be side effect of some change in kernel. The same as other systemd/kernel bug BZ1413075.

And systemd guys should try to explain what's going on here and why it should be allowed in selinux-policy.

Comment 13 Adam Williamson 2017-01-21 02:38:19 UTC
This also affects postgresql (another openQA test, the Server 'database server' role).

Comment 14 Adam Williamson 2017-01-31 08:01:32 UTC
Can we please get this fixed, guys? Rawhide has been broken for 19 days now, and that's really not how we want to do things these days.

Comment 15 Lukas Slebodnik 2017-01-31 09:09:59 UTC
(In reply to Adam Williamson from comment #14)
> Can we please get this fixed, guys? Rawhide has been broken for 19 days now,
> and that's really not how we want to do things these days.

That's just a SELinux policy; which you can workaround with permissive mode (which is not ideal)

But even with this fix/custom SELinux policy it will not solve everything.
systemd is broken anyway BZ1413075.

I think there is a time to change workflow of pushing critical components to rawhide. The same as openSUSE does with tumbleweed. And you as a fedora QE should be the right person for pushing such change/proposal.

IMHO it will force maintainers of critical packages to package their own SELinux policy if they want to release more often without breaking the rawhide.

Comment 16 Adam Williamson 2017-01-31 10:01:56 UTC
I know there are several other bugs in systemd, I am trying to get them *all* fixed :) And yes, we are actively working on Rawhide gating, but it's not going to happen overnight, so for now we at least want to fix the things that break quickly. That's what the nightly openQA testing is intended to let us achieve, instead of just leaving it broken for weeks at a time.

Comment 17 Zbigniew Jędrzejewski-Szmek 2017-02-01 16:35:23 UTC
#1392161 was fixed by allowing mounton (https://github.com/selinux-policy/selinux-policy/commit/08645ba5ed6467bee8218bec32015ef4f1c70fa0). Clearly the syscalls here are different (read, write), so it's not the same.

Comment 18 Lukas Vrabec 2017-02-01 16:47:44 UTC
Hi guys, 

Could you give me all AVC msgs related to this BZ? 

THanks.

Comment 20 Lukas Vrabec 2017-02-02 11:37:38 UTC
PR merged.
Thanks!

Provide builds asap.

Comment 21 Lukas Slebodnik 2017-02-02 14:12:11 UTC
Thank you it works for me as expected.

Comment 22 Jan Pokorný [poki] 2017-02-07 17:23:40 UTC
*** Bug 1412165 has been marked as a duplicate of this bug. ***

Comment 23 Michel Normand 2017-02-09 13:57:55 UTC
I addressed a similar problem in a specific bug https://bugzilla.redhat.com/show_bug.cgi?id=1420753 because I am not sure this is covered by this bug.

Comment 24 Geoffrey Marr 2017-02-13 19:54:39 UTC
Discussed during the 2017-02-13 blocker review meeting: [1]

The decision was made to classify this bug as an AcceptedBlocker as it violates the following Alpha blocker criteria:

""Unless explicitly specified otherwise, after system installation the Cockpit web management interface must be running and accessible on its default port (9090)"

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2017-02-13/f26-blocker-review.2017-02-13-18.01.txt

Comment 25 Lukas Slebodnik 2017-02-13 22:55:06 UTC
(In reply to Geoffrey Marr from comment #24)
> Discussed during the 2017-02-13 blocker review meeting: [1]
> 
> The decision was made to classify this bug as an AcceptedBlocker as it
> violates the following Alpha blocker criteria:
> 
> ""Unless explicitly specified otherwise, after system installation the
> Cockpit web management interface must be running and accessible on its
> default port (9090)"
> 
> [1]
> https://meetbot.fedoraproject.org/fedora-blocker-review/2017-02-13/f26-
> blocker-review.2017-02-13-18.01.txt

I think this bug can be closed because it is fixed in selinux-policy-3.13.1-236.fc26.
And koji says the lates version for rawhide is selinux-policy-3.13.1-239.fc26

Comment 26 Adam Williamson 2017-02-14 02:36:57 UTC
We are waiting for a successful Rawhide compose that doesn't run into other major bugs to confirm that this is in fact fully fixed.

Comment 27 Fedora End Of Life 2017-02-28 10:56:41 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 28 Adam Williamson 2017-03-02 23:46:28 UTC
So yeah, this definitely got fixed, we have not encountered it on recent Rawhide tests.