Bug 1412750
Summary: | Multiple services in recent Rawhide fail to start with: Failed at step USER spawning (executable): Permission denied | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Paul Whalen <pwhalen> |
Component: | selinux-policy | Assignee: | Lukas Vrabec <lvrabec> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 26 | CC: | awilliam, dominick.grift, dominik.mierzejewski, dwalsh, gmarr, jpokorny, lslebodn, lvrabec, mgrepl, msekleta, normand, plautrba, pmoore, pwhalen, robatino, ssekidde, zbyszek |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | AcceptedBlocker | ||
Fixed In Version: | selinux-policy-3.13.1-236.fc26 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-02 23:46:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1349184 |
Description
Paul Whalen
2017-01-12 17:06:15 UTC
openQA has been running into this since 20170114.n.1. It worked in 20170107.n.0. There've been two cockpit package builds between those dates. Nominating as an Alpha blocker, per criterion "Unless explicitly specified otherwise, after system installation the Cockpit web management interface must be running and accessible on its default port (9090)" - https://fedoraproject.org/wiki/Fedora_26_Alpha_Release_Criteria#Cockpit_management_interface . In fact, failures to start several other services seem to have the same cause: colord, packagekit, and geoclue. They're all 'Failed at step USER spawning (executable): Permission denied", and they all start with SELinux in permissive mode. Adding zbyszek as there may be a systemd component to this too. Dunno, #1413047 is about F25, and it seems to be kernel-version dependent. And AVC seem to be different. There are no relevant AVCs here, maybe some dontaudit rules are in effect? Yeah, it's definitely not the same thing. In rawhide, it fails in send_user_lookup, on writev() back to the mother ship. This code is not in F25. (gdb) bt #0 writev () at ../sysdeps/unix/syscall-template.S:84 #1 0x0000561cfcc53cd2 in send_user_lookup (gid=<optimized out>, uid=<optimized out>, user_lookup_fd=29, unit=0x561cfced7e80) at src/core/execute.c:2210 #2 exec_child (unit=unit@entry=0x561cfced7e80, command=command@entry=0x561cfced9a20, context=context@entry=0x561cfced82a8, params=params@entry=0x7ffeadabc3f0, runtime=runtime@entry=0x0, dcreds=dcreds@entry=0x561cfced85d0, argv=0x561cfcf576d0, socket_fd=-1, named_iofds=0x7ffeadabc2a0, fds=0x0, n_fds=0, files_env=0x0, user_lookup_fd=29, exit_status=0x7ffeadabc250) at src/core/execute.c:2358 #3 0x0000561cfcc56f78 in exec_spawn (unit=0x561cfced7e80, command=0x561cfced9a20, context=0x561cfced82a8, params=0x7ffeadabc3f0, runtime=0x0, dcreds=0x561cfced85d0, ret=0x7ffeadabc3e0) at src/core/execute.c:2893 #4 0x0000561cfcc60e31 in service_spawn.lto_priv.540 (s=s@entry=0x561cfced7e80, c=<optimized out>, timeout=<optimized out>, flags=flags@entry=(EXEC_APPLY_PERMISSIONS | EXEC_APPLY_CHROOT | EXEC_APPLY_TTY_STDIN | EXEC_IS_CONTROL), _pid=_pid@entry=0x561cfced85e4) at src/core/service.c:1351 #5 0x0000561cfcc62567 in service_enter_start_pre (s=0x561cfced7e80) at src/core/service.c:1851 #6 service_start.lto_priv.74 (u=0x561cfced7e80) at src/core/service.c:2076 #7 0x0000561cfcc32eea in unit_start (u=0x561cfced7e80) at src/core/unit.c:1590 #8 job_perform_on_unit.lto_priv.579 (j=0x7ffeadabc590) at src/core/job.c:531 #9 0x0000561cfcc7e698 in job_run_and_invalidate (j=<optimized out>) at src/core/job.c:595 #10 manager_dispatch_run_queue.lto_priv.596 (source=<optimized out>, userdata=<optimized out>, userdata=<optimized out>) at src/core/manager.c:1588 #11 0x00007f92a3905d2d in source_dispatch (s=s@entry=0x561cfce0dc80) at src/libsystemd/sd-event/sd-event.c:2307 #12 0x00007f92a3905fd4 in sd_event_dispatch (e=e@entry=0x561cfce0d9e0) at src/libsystemd/sd-event/sd-event.c:2626 #13 0x00007f92a3907507 in sd_event_run (e=0x561cfce0d9e0, timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2685 #14 0x0000561cfcc0b9a5 in manager_loop (m=<optimized out>) at src/core/manager.c:2251 #15 main (argc=<optimized out>, argv=0x7ffeadabca18) at src/core/main.c:1899 So, the socket it's trying to write to comes from socketpair(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0, m->user_lookup_fds) in manager_setup_user_look_up. Can we amend the selinux policy to allow writev() on that? (In reply to Zbigniew Jędrzejewski-Szmek from comment #4) > Dunno, #1413047 is about F25, and it seems to be kernel-version dependent. > And AVC seem to be different. There are no relevant AVCs here, maybe some > dontaudit rules are in effect? That is not true. BZ1413047 is not about F25. The systemd is from rawhide and the same problem is with latest rawhidekernel. . It is just my workaround to use the older f25 kernel. BTW there is a slightly related BZ1413075 which I thin in permissive mode. (In reply to Lukas Slebodnik from comment #6) > (In reply to Zbigniew Jędrzejewski-Szmek from comment #4) > > Dunno, #1413047 is about F25, and it seems to be kernel-version dependent. > > And AVC seem to be different. There are no relevant AVCs here, maybe some > > dontaudit rules are in effect? > > That is not true. BZ1413047 is not about F25. The systemd is from rawhide > and the same problem is with latest rawhidekernel. > . > It is just my workaround to use the older f25 kernel. > >BTW there is a slightly related BZ1413075 which I thin in permissive mode. ehm, too many typos. BTW there is a slightly related BZ1413075 hit I thin in permissive mode. (In reply to Lukas Slebodnik from comment #6) > That is not true. BZ1413047 is not about F25. The systemd is from rawhide > and the same problem is with latest rawhidekernel. You're right, this does seem to be the same problem. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1413047#c2: #============= init_t ============== #!!!! This avc has a dontaudit rule in the current policy allow init_t kernel_t:unix_stream_socket { read write }; #!!!! This avc has a dontaudit rule in the current policy allow init_t kernel_t:unix_dgram_socket write; *** Bug 1413047 has been marked as a duplicate of this bug. *** Michal, Zbigniew is it a duplicate of BZ1392161? I'm almost certain it isn't, that report is from long before this problem started happening (see #c1). (In reply to Adam Williamson from comment #11) > I'm almost certain it isn't, that report is from long before this problem > started happening (see #c1). You never know; it could be side effect of some change in kernel. The same as other systemd/kernel bug BZ1413075. And systemd guys should try to explain what's going on here and why it should be allowed in selinux-policy. This also affects postgresql (another openQA test, the Server 'database server' role). Can we please get this fixed, guys? Rawhide has been broken for 19 days now, and that's really not how we want to do things these days. (In reply to Adam Williamson from comment #14) > Can we please get this fixed, guys? Rawhide has been broken for 19 days now, > and that's really not how we want to do things these days. That's just a SELinux policy; which you can workaround with permissive mode (which is not ideal) But even with this fix/custom SELinux policy it will not solve everything. systemd is broken anyway BZ1413075. I think there is a time to change workflow of pushing critical components to rawhide. The same as openSUSE does with tumbleweed. And you as a fedora QE should be the right person for pushing such change/proposal. IMHO it will force maintainers of critical packages to package their own SELinux policy if they want to release more often without breaking the rawhide. I know there are several other bugs in systemd, I am trying to get them *all* fixed :) And yes, we are actively working on Rawhide gating, but it's not going to happen overnight, so for now we at least want to fix the things that break quickly. That's what the nightly openQA testing is intended to let us achieve, instead of just leaving it broken for weeks at a time. #1392161 was fixed by allowing mounton (https://github.com/selinux-policy/selinux-policy/commit/08645ba5ed6467bee8218bec32015ef4f1c70fa0). Clearly the syscalls here are different (read, write), so it's not the same. Hi guys, Could you give me all AVC msgs related to this BZ? THanks. See https://bugzilla.redhat.com/show_bug.cgi?id=1412750#c8, https://bugzilla.redhat.com/show_bug.cgi?id=1413047#c2. https://github.com/fedora-selinux/selinux-policy/pull/184 PR merged. Thanks! Provide builds asap. Thank you it works for me as expected. *** Bug 1412165 has been marked as a duplicate of this bug. *** I addressed a similar problem in a specific bug https://bugzilla.redhat.com/show_bug.cgi?id=1420753 because I am not sure this is covered by this bug. Discussed during the 2017-02-13 blocker review meeting: [1] The decision was made to classify this bug as an AcceptedBlocker as it violates the following Alpha blocker criteria: ""Unless explicitly specified otherwise, after system installation the Cockpit web management interface must be running and accessible on its default port (9090)" [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2017-02-13/f26-blocker-review.2017-02-13-18.01.txt (In reply to Geoffrey Marr from comment #24) > Discussed during the 2017-02-13 blocker review meeting: [1] > > The decision was made to classify this bug as an AcceptedBlocker as it > violates the following Alpha blocker criteria: > > ""Unless explicitly specified otherwise, after system installation the > Cockpit web management interface must be running and accessible on its > default port (9090)" > > [1] > https://meetbot.fedoraproject.org/fedora-blocker-review/2017-02-13/f26- > blocker-review.2017-02-13-18.01.txt I think this bug can be closed because it is fixed in selinux-policy-3.13.1-236.fc26. And koji says the lates version for rawhide is selinux-policy-3.13.1-239.fc26 We are waiting for a successful Rawhide compose that doesn't run into other major bugs to confirm that this is in fact fully fixed. This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle. Changing version to '26'. So yeah, this definitely got fixed, we have not encountered it on recent Rawhide tests. |