Bug 1278602 - dbus can't send audit events
Summary: dbus can't send audit events
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dbus
Version: 23
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Colin Walters
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-05 23:01 UTC by Steve Grubb
Modified: 2016-12-20 15:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 15:26:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
0001-Initialize-SELinux-and-Apparmor-after-capabilities-a.patch (2.27 KB, patch)
2015-11-08 15:22 UTC, Laurent Bigonville
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 92832 0 None None None Never

Description Steve Grubb 2015-11-05 23:01:12 UTC
Description of problem:
Dbus in F23 has refactored code around selinux and audit. When someone reloads policy by doing, 'semodule -B', it leads to a message going into syslog:

dbus[657]: Can't send to audit system: USER_AVC
>> avc:  received policyload notice (seqno=3) exe="/usr/bin/dbus-daemon"
>> sauid=102 hostname=? addr=? terminal=?

Looking at dbus, there are 2 threads: 657 and 666. Doing a strace shows this:

[pid   666] sendto(7, "\200\0\0\0S\4\5\0\1\0\0\0\0\0\0\0avc:  received p"..., 128, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 128
[pid   666] poll([{fd=7, events=POLLIN}], 1, 500) = 1 ([{fd=7, revents=POLLIN}])
[pid   666] recvfrom(7, "\224\0\0\0\2\0\0\0\1\0\0\0\221\2\0\0\377\377\377\377\200\0\0\0S\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 148
[pid   666] recvfrom(7, "\224\0\0\0\2\0\0\0\1\0\0\0\221\2\0\0\377\377\377\377\200\0\0\0S\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 148

Checking thread 666

# cat /proc/657/task/666/status | grep Cap
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff

No capabilities. CAP_AUDIT_WRITE is needed. Checking the other thread:

# cat /proc/657/task/657/status | grep Cap
CapInh:	0000000000000000
CapPrm:	0000000020000000
CapEff:	0000000020000000
CapBnd:	0000003fffffffff

We find the capabilities on the other thread.

Not sure how they wound up on the wrong thread. I seem to remember this happening a couple years ago and it got fixed. With the refactor I guess someone didn't know this can happen.

Version-Release number of selected component (if applicable):
dbus-1.10.0-1.fc23.x86_64

Comment 1 Laurent Bigonville 2015-11-08 15:22:20 UTC
Created attachment 1091279 [details]
0001-Initialize-SELinux-and-Apparmor-after-capabilities-a.patch

I tested this on debian an it seems to work

Comment 2 Simon McVittie 2015-11-17 14:28:54 UTC
(In reply to Steve Grubb from comment #0)
> Dbus in F23 has refactored code around selinux and audit.

It does, but the refactoring was to merge duplicated code from selinux.c and apparmor.c into one file. It shouldn't change the order of events at startup; I've read back through the git diffs, and can't find anything post-1.8 that looks like a cause for this. Can anyone from Red Hat spot something that I've missed?

> Not sure how they wound up on the wrong thread.

This is the first time I've been aware of dbus-daemon having a second thread - some of the D-Bus functions it uses are explicitly not thread-safe, so we should be very careful about how much is threaded at all. It's entirely single-threaded when SELinux is not in use.

Is there any way to receive the same information as struct avc_log_callback.func_log and struct avc_log_callback.func_audit without using a thread for it, perhaps by poll()ing a file descriptor?

As noted on <https://bugs.freedesktop.org/show_bug.cgi?id=92832#c7>, Laurent's proposed patch is essentially a revert of a commit from 2008, which makes me reluctant to apply it: presumably there was a good reason for that commit.

Comment 3 Simon McVittie 2015-11-17 14:33:03 UTC
Is this definitely a regression between 1.8 and 1.10 (F23), or could it potentially be an older bug?

Comment 4 Steve Grubb 2015-11-17 17:08:57 UTC
This bug has occurred several times that I am aware of. For example, we have this rpm commit message for 1.1.2-5,

"Reverse. We_were_root check to setpcap if we were root.  Also only init audit if we were root.  So error dbus message will not show up when policy reload happens.  dbus -session will no longer try to send audit message, only system will."

Also caused problems in 0.92-2, "Only audit on the system bus". I am also pretty sure it was fixed in 0.61-4.

The issue could be older than 1.8. I have a system that has logs back to 2010. The last user_avc that I see dbus issuing is from July 2013. That seems to correlate to dbus 1.4.10.

Comment 5 Simon McVittie 2015-11-17 18:40:59 UTC
(In reply to Steve Grubb from comment #4)
> This bug has occurred several times that I am aware of. For example, we have
> this rpm commit message for 1.1.2-5,
> 
> "Reverse. We_were_root check to setpcap if we were root.  Also only init
> audit if we were root.  So error dbus message will not show up when policy
> reload happens.  dbus -session will no longer try to send audit message,
> only system will."

That appears to be <https://bugs.freedesktop.org/show_bug.cgi?id=12429>.

> Also caused problems in 0.92-2, "Only audit on the system bus".

That appears to be a local patch in Fedora that never went upstream.

> I am also pretty sure it was fixed in 0.61-4.

That appears to be <https://bugzilla.redhat.com/show_bug.cgi?id=518541> which didn't have an upstream bug.

Comment 6 Simon McVittie 2015-11-17 19:15:22 UTC
Since the commit in 2008 that Laurent is basically proposing to revert, the process goes like this:

* dbus-daemon double-forks, if it is going to do so. Execution continues
  in the grandchild process.

* LSM initialization (originally only SELinux, now AppArmor too).
  In the SELinux case, avc_init() is given callbacks to start a thread.
  I don't know what part of libselinux actually starts the thread, or when.

* If we are the system dbus-daemon, change uid/gid to messagebus or whatever
  the distribution calls it:

      have_audit_write = capng_have_capability (CAPNG_PERMITTED,
                                                CAP_AUDIT_WRITE);
      capng_clear (CAPNG_SELECT_BOTH);
      /* Only attempt to retain CAP_AUDIT_WRITE if we had it when
       * starting.  See:
       * https://bugs.freedesktop.org/show_bug.cgi?id=49062#c9
       */
      if (have_audit_write)
        capng_update (CAPNG_ADD, CAPNG_EFFECTIVE | CAPNG_PERMITTED,
                      CAP_AUDIT_WRITE);
      rc = capng_change_id (uid, gid, CAPNG_DROP_SUPP_GRP);

* bus_audit_init():
  - if we do not have CAP_AUDIT_WRITE effective, ignore
  - audit_open()

What Laurent has proposed is to move the LSM-specific initialization after bus_audit_init(), i.e. after audit_open(). This would mean that the SELinux thread inherits the audit fd and the AUDIT_WRITE capability. However, it might re-introduce whatever issue was involved in the thread starting at <http://lists.freedesktop.org/archives/dbus/2008-October/010493.html>. Unfortunately, that just says "Fix problem" without really explaining what happened.

Other audit-relevant changes that are vaguely recent:

https://bugs.freedesktop.org/show_bug.cgi?id=83856
https://bugs.freedesktop.org/show_bug.cgi?id=49062
https://bugs.freedesktop.org/show_bug.cgi?id=71187

Comment 7 Steve Grubb 2015-11-17 19:33:24 UTC
Auditing and selinux need to be on the same thread and that thread needs CAP_AUDIT_WRITE. The audit fd needs to be opened before selinux get initialized because when selinux is active, it may need to write to the audit system if there is an access denial.

Comment 8 Simon McVittie 2015-11-17 19:47:27 UTC
Moving bus_audit_init() to before we drop privileges might be an alternative? There are basically two cases. The system dbus-daemon starts as root, and drops privileges. The session dbus-daemon starts as an ordinary user, but could potentially have CAP_AUDIT_WRITE from setcap(8) (and there seems to be interest in having that work).

I'm open to suggestions on this stuff, and I'd be happy to review patches from the dbus perspective, but I'm reluctant to merge any changes to the startup sequence until an expert tells us what the sequence ought to be.

It would be great if we could move this discussion to <https://bugs.freedesktop.org/show_bug.cgi?id=92832>, where I've cc'd the author of dbus-daemon's AppArmor support (which uses the same audit machinery).

Comment 9 Simon McVittie 2015-11-17 19:59:27 UTC
(In reply to Steve Grubb from comment #7)
> Auditing and selinux need to be on the same thread

What does "selinux" mean in terms of function calls?

My understanding is that the implementation in dbus-daemon has always looked like this:

* main thread:
  - does initialization
  - calls avc_init() which indirectly starts the notify thread
  - does the main dbus-daemon work:
    - (lots of non-SELinux things)
    - calls avc_sid_to_context(), avc_context_to_sid(), avc_has_perm()

* notify thread:
  - runs the function that we were given as an argument to
    (struct avc_thread_callback).func_create_thread
    - reads events from netlink in an infinite loop, blocking on the
      netlink socket

* one of those threads (which one? it isn't clear to me)
  - receives calls to (struct avc_log_callback).func_log
    - logs them via audit_log_user_avc_message()
  - receives calls to (struct avc_log_callback).func_audit
    - copies aux data into a buffer as requested
  - receives calls to AVC_CALLBACK_RESET callback
    - kills main thread with SIGHUP

Comment 10 Steve Grubb 2015-11-17 20:12:16 UTC
In reply to comment #9

"Selinux" means the thread that calls avc_has_perm(). This is the function call that makes a decision that might need to get logged to the audit system. If its called by both threads, then both threads need CAP_AUDIT_WRITE. I presume apparmor may also need to log access decisions and would want to be on the same thread.

We have never consider the user session bus to be security relevant because it runs without privileges. We would probably not want to spread privileges around so that it could log events because there is a chance someone could abuse the privs somehow.

Comment 11 Simon McVittie 2015-11-20 19:20:21 UTC
(In reply to Steve Grubb from comment #10)
> "Selinux" means the thread that calls avc_has_perm(). This is the function
> call that makes a decision that might need to get logged to the audit
> system. If its called by both threads, then both threads need
> CAP_AUDIT_WRITE. I presume apparmor may also need to log access decisions
> and would want to be on the same thread.

The AppArmor integration is not multi-threaded: it writes to the audit system if its equivalent of avc_has_perm() says it should, but it does not need to receive asynchronous events from the outside world.

The SELinux integration, in addition to doing that on the main thread, has a separate thread (I'm calling it the "notify thread" here) which receives asynchronous messages from netlink and writes *those* to the audit log.

Am I right in thinking that the failure mode represented by this bug report is that dbus-daemon cannot send audit events resulting from netlink messages in the notify thread; but if a D-Bus method call is forbidden by SELinux, even in 1.10, that is successfully sent to the audit subsystem?

(In reply to Steve Grubb from comment #7)
> Auditing and selinux need to be on the same thread

Are saying that auditing is not thread-safe, and we can only safely write to the audit subsystem from one thread?

Or is it valid to log audit events from whichever thread receives them, as long as the thread that wants to log the event has CAP_AUDIT_WRITE?

(In reply to Simon McVittie from comment #8)
> Moving bus_audit_init() to before we drop privileges might be an
> alternative?

Laurent has proposed <https://bugs.freedesktop.org/show_bug.cgi?id=92832#c8> which would change the startup sequence to this. Is this valid?

* dbus-daemon double-forks, if it is going to do so. Execution continues
  in the grandchild process.

* If we are the system dbus-daemon, change uid/gid to messagebus or whatever
  the distribution calls it:

      have_audit_write = capng_have_capability (CAPNG_PERMITTED,
                                                CAP_AUDIT_WRITE);
      capng_clear (CAPNG_SELECT_BOTH);
      /* Only attempt to retain CAP_AUDIT_WRITE if we had it when
       * starting.  See:
       * https://bugs.freedesktop.org/show_bug.cgi?id=49062#c9
       */
      if (have_audit_write)
        capng_update (CAPNG_ADD, CAPNG_EFFECTIVE | CAPNG_PERMITTED,
                      CAP_AUDIT_WRITE);
      rc = capng_change_id (uid, gid, CAPNG_DROP_SUPP_GRP);

* bus_audit_init():
  - if we do not have CAP_AUDIT_WRITE effective, ignore
  - audit_open()

* LSM initialization (originally only SELinux, now AppArmor too).
  In the SELinux case, avc_init() is given callbacks to start a thread.

Comment 12 Simon McVittie 2015-11-20 19:25:15 UTC
Does anyone who knows about SELinux and/or audit have any idea why <http://lists.freedesktop.org/archives/dbus/2008-October/010493.html> was applied, and whether changing the sequence in this way will cause us to regress?

Comment 13 Steve Grubb 2015-11-20 19:44:15 UTC
Regarding the question about selinux being on the same thread as audit, its not a thread safety issue. Its that when selinux makes a decision, it potentially has to send an audit event recording its decision. This could of course be done through IPC mechanisms between threads. But I think the callback function for selinux would be on the same thread for simplicity.

I don't know enough about dbus design to know what the thread watching netlink is doing. If its watching for a selinux policy reload notification, then it would also need CAP_AUDIT_WRITE. If its not doing anything selinux related, then it would not need the capability. That said, in the original problem description I can see that the thread trying to record the policy reload is the second one started and doesn't have the permissions.

If selinux is doing things on both threads, then I guess both threads need CAP_AUDIT_WRITE.

Comment 14 Simon McVittie 2015-11-20 20:02:55 UTC
(In reply to Simon McVittie from comment #12)
> Does anyone who knows about SELinux and/or audit have any idea why
> <http://lists.freedesktop.org/archives/dbus/2008-October/010493.html> was
> applied, and whether changing the sequence in this way will cause us to
> regress?

It looks as though the original bug report may have been <http://lists.freedesktop.org/archives/dbus/2008-October/010491.html>. process_config_postinit() indirectly calls bus_selinux_id_table_insert() which requires that the AVC has been initialized.

So the constraints are:

* on the system bus we must switch uid before bus_selinux_full_init(), because
  if we switch uid after the notify thread has started, the kernel will
  remove CAP_AUDIT_WRITE from *both* threads for the setuid(), and libcap-ng
  will only re-apply it to the main thread (Bug #221168, Fedora dbus 1.0.2-3,
  applied upstream as e3d30a0 with a uselessly vague commit message)

* process_config_postinit() must be after bus_selinux_full_init() to avoid
  <http://lists.freedesktop.org/archives/dbus/2008-October/010491.html>

* ideally we would set up auditing before bus_selinux_full_init()
  because otherwise we might see events that can't go to the audit log
  because we have no audit fd yet

Any more?

This means https://bugs.freedesktop.org/show_bug.cgi?id=92832#c5 is wrong (it fails the second constraint) but https://bugs.freedesktop.org/show_bug.cgi?id=92832#c8 should be OK.

Also, we need a better audit trail, no pun intended, for why changes like this were made in poorly-understood parts of dbus - some of the commit messages from a few years ago are very unclear. I'm going to look into whether attaching git notes to old commits would help to solve this.

Comment 15 Simon McVittie 2015-11-20 20:12:21 UTC
(In reply to Steve Grubb from comment #13)
> I don't know enough about dbus design to know what the thread watching
> netlink is doing.

I'm afraid this part is not our design, so I can't tell you with any certainty. We have a function that starts a thread to run arbitrary code, and we pass it to avc_init(), for libselinux to do whatever libselinux does:

http://cgit.freedesktop.org/dbus/dbus/tree/bus/selinux.c#n198
http://cgit.freedesktop.org/dbus/dbus/tree/bus/selinux.c#n94
http://cgit.freedesktop.org/dbus/dbus/tree/bus/selinux.c#n346

Something in libselinux schedules something to be run in that thread. I *think* it's basically this:

https://github.com/SELinuxProject/selinux/blob/master/libselinux/src/avc_internal.c#L249

which I think ends up calling back into libdbus:

http://cgit.freedesktop.org/dbus/dbus/tree/bus/selinux.c#n118
http://cgit.freedesktop.org/dbus/dbus/tree/bus/selinux.c#n153

which would explain why you see policy-load notices on the secondary thread?

If there's a way we can get this information by having libselinux give us a fd to poll(), and calling back into libselinux when the fd becomes readable, then I'd be very happy to merge a patch to do that and lose the second thread altogether. (See also https://bugs.freedesktop.org/show_bug.cgi?id=92831 which points out that avc_init() is deprecated - but I have no idea what the replacement is, so someone who knows libselinux would have to contribute that.)

Comment 16 Stephen Smalley 2015-11-20 21:06:39 UTC
Why don't people ask questions about SELinux on selinux.gov?

In any event, just pass NULL thread_callbacks argument to avc_init() and be done with it.

Comment 17 Simon McVittie 2015-11-23 09:49:15 UTC
(In reply to Stephen Smalley from comment #16)
> Why don't people ask questions about SELinux on selinux.gov?

Sorry, this is the first time I've heard of that list (if it is indeed a list). I'm an upstream maintainer who has inherited code with optional SELinux interactions, which don't seem to have been well-understood by the former maintainer who merged them either. I don't use Fedora or SELinux myself.

> In any event, just pass NULL thread_callbacks argument to avc_init() and be
> done with it.

If I do that, will I still receive an AVC_CALLBACK_RESET callback when the policy changes, a (struct avc_log_callback).func_audit callback when my process call avc_has_perm() and gets a denial, and a (struct avc_log_callback).func_log callback when appropriate (which I think is both those situations)?

Comment 18 Stephen Smalley 2015-11-23 13:38:58 UTC
It is the upstream selinux mailing list, existing since the public release of SELinux in 2000.  Subscribe via selinux-join AT tycho.nsa.gov.

Yes, passing NULL thread_callbacks just causes libselinux to poll the netlink socket on subsequent avc_has_perm() calls rather than starting a separate thread for it.  No loss in functionality.  There are other alternatives as well, but that's likely the simplest.

Comment 19 Simon McVittie 2015-11-24 12:07:13 UTC
(In reply to Stephen Smalley from comment #18)
> Yes, passing NULL thread_callbacks just causes libselinux to poll the
> netlink socket on subsequent avc_has_perm() calls rather than starting a
> separate thread for it.  No loss in functionality.  There are other
> alternatives as well, but that's likely the simplest.

We reload dbus-daemon's configuration as if via SIGHUP when we are notified about AVC_CALLBACK_RESET, and I suspect the point at which we call avc_has_perm() may be too late for the effect of that reload to be used in that transaction, so I don't think that's a suitable solution for dbus-daemon.

If we call avc_netlink_acquire_fd, watch that fd for POLLIN events, call avc_netlink_check_nb() whenever we see POLLIN on that fd, and call avc_netlink_release_fd() before avc_destroy(), is that valid? That seems to be what Xorg does.

Comment 20 Stephen Smalley 2015-11-24 14:33:20 UTC
Yes, that should work.  You should then be able to get rid of the thread and lock callbacks for avc_init (the latter assuming that dbus is otherwise single-threaded).

Comment 21 Simon McVittie 2015-12-02 11:32:25 UTC
Please test this change (review also welcome):

http://cgit.freedesktop.org/dbus/dbus/commit/?id=a3a5935a0a038c3b44c61ce5719f0f7e647b96c6

For context for people who know SELinux but not D-Bus:

* this is all after we do the standard double-fork technique for daemonizing,
  if we're doing that
* _dbus_change_to_daemon_user() drops privileges from root to a daemon user
  (messagebus or dbus or whatever), making CAP_AUDIT_WRITE effective
  if permitted
* bus_audit_init() calls audit_open() if CAP_AUDIT_WRITE is effective
* bus_selinux_full_init() calls avc_init()

It's in the master branch and will be in a 1.11.0 development release shortly, assuming nobody tells me it's hopelessly broken.

I'll backport this to the 1.10 stable branch if I get positive test results. However, if nobody tests this, I'm not going to backport it and risk making matters worse.

(In reply to Simon McVittie from comment #19)
> If we call avc_netlink_acquire_fd, watch that fd for POLLIN events, call
> avc_netlink_check_nb() whenever we see POLLIN on that fd, and call
> avc_netlink_release_fd() before avc_destroy(), is that valid? That seems to
> be what Xorg does.

A tested patch for this would be welcome (ideally please attach a git-format-patch to <https://bugs.freedesktop.org/show_bug.cgi?id=92831>). I might write a patch myself if I have time, but I don't use SELinux, so I would be relying the people it affects to test it before merging.

Comment 22 David King 2015-12-07 12:22:44 UTC
There is now a 1.11.0 version of dbus in Rawhide (as dbus-1.11.0-1.fc24) for testing.

Comment 23 Fedora End Of Life 2016-11-24 13:07:14 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2016-12-20 15:26:09 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.