1767159 – SSSD is crashing: dbus_watch_handle() is invoked with corrupted 'watch' value

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1767159 - SSSD is crashing: dbus_watch_handle() is invoked with corrupted 'watch' value

Summary: SSSD is crashing: dbus_watch_handle() is invoked with corrupted 'watch' value

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	7.7
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.9
Assignee:	Alexey Tikhonov
QA Contact:	sssd-qe
Docs Contact:
URL:
Whiteboard:	sync-to-jira
Duplicates (2):	1731577 1797159 (view as bug list)
Depends On:
Blocks:	1816031
TreeView+	depends on / blocked

Reported:	2019-10-30 19:48 UTC by Vinay Mishra
Modified:	2024-03-25 15:29 UTC (History)
CC List:	13 users (show)
Fixed In Version:	sssd-1.16.5-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1816031 (view as bug list)
Environment:
Last Closed:	2020-09-29 19:49:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	SSSD sssd issues 3701	None	closed	SSSD service is crashing: dbus_watch_handle() is invoked with corrupted 'watch' value	2020-10-22 06:13:41 UTC
Red Hat Issue Tracker	SSSD-1746	None	None	None	2023-09-07 20:57:51 UTC
Red Hat Issue Tracker	SSSD-2304	None	None	None	2023-09-07 20:58:00 UTC
Red Hat Product Errata	RHBA-2020:3904	None	None	None	2020-09-29 19:50:15 UTC

Comment 3 Alexey Tikhonov 2019-11-04 15:48:19 UTC

(In reply to Vinay Mishra from comment #2)
> (gdb) bt
> #0  dbus_watch_handle (watch=0xa0, flags=2) at ../../dbus/dbus-watch.c:740


This looks similar to:
1) https://bugzilla.redhat.com/show_bug.cgi?id=1144765#c12 (I do not think resolution was correct)
2) https://pagure.io/SSSD/sssd/issue/2660 (no resolution)
3) https://bugzilla.redhat.com/show_bug.cgi?id=1731577 (no resolution)


Crashing process:
1) monitor
2) sssd_nss
3) sssd_be, sssd_ssh
And in this case `cmdline`: /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
 => seems to be service unspecific.


Package versions:
1) sssd - 1.12.0, dbus - 1.8.6 (released 2014-09-15)
2) sssd - 1.12.4, dbus - ?
3) sssd - 1.16.2, dbus - 1.10.24-12 (released 2017-09-27)
And in this case:  as in (3)


I was not able to find any similar issues with other apps using dbus.
I was not able to find any similar issues with sssd 2.x release. `sbus` code was refactored in 2.x branch.
Probably there is a bug in sssd 1.x sbus code...

What really surprises me is that corrupted pointer is "0xa0" in all 4 cases I was able to find. Doesn't look like a coincidence.

Comment 4 Alexey Tikhonov 2019-11-04 15:56:47 UTC

(In reply to Alexey Tikhonov from comment #3)
> I was not able to find any similar issues with other apps using dbus.

I mean I was not able to find "dbus_watch_handle (watch=0xa0)" with other apps.
Of course, there are a lot of generic SIGSEGV in dbus_watch_handle() can be found.

Comment 5 Alexey Tikhonov 2019-11-04 18:31:22 UTC

My guess is following is happening:

 - libdbus removes dbush watch calling `sbus_remove_watch()` callback
 - watch is freed, but `watch->fde()` (tevent fd monitoring; set up in `sbus_add_watch()`) is not removed
( https://tevent.samba.org/group__tevent.html#gadc52787f3daf49e589066d37a5cdb18c : "To cancel the monitoring of a file descriptor, call talloc_free() on the object returned by this function.")
 - tevent calls `sbus_watch_handler()` to report fd became writable, but watch context is already freed => use-after-free

Indirect confirmation is that sssd 2-x has this fixed: https://github.com/SSSD/sssd/blob/master/src/sbus/connection/sbus_watch.c#L283
(though I would prefer to put this into watch destructor)


I need to think about it yet.

Comment 7 Alexey Tikhonov 2019-11-05 15:55:09 UTC

(In reply to Alexey Tikhonov from comment #5)
> My guess is following is happening:
> 
>  - libdbus removes dbush watch calling `sbus_remove_watch()` callback
>  - watch is freed, but `watch->fde()` (tevent fd monitoring; set up in
> `sbus_add_watch()`) is not removed
>  - tevent calls `sbus_watch_handler()` to report fd became writable, but
> watch context is already freed => use-after-free
> 
> I need to think about it yet.

I can't be 100% sure described bug is root cause of this bz (IMO most probably), but I am pretty sure this is a bug.
I will reuse (reopen) existing upstream ticket https://pagure.io/SSSD/sssd/issue/2660 to track fixing it.

Comment 10 Alexey Tikhonov 2019-11-06 18:51:15 UTC

```
(gdb) frame 1
#1  0x00007f5b40066398 in sbus_watch_handler (ev=<optimized out>, fde=<optimized out>, 
    flags=<optimized out>, data=<optimized out>) at src/sbus/sssd_dbus_common.c:94
94                  dbus_watch_handle(watch->dbus_write_watch, DBUS_WATCH_WRITABLE);


(gdb) p *watch
$1 = {prev = 0x0, next = 0x0, conn = 0x55efe0f1e2f0, fde = 0x55efe0f1ed70, fd = 12, 
  dbus_read_watch = 0x0, dbus_write_watch = 0xa0}


(gdb) p *(watch->conn)
$2 = {ev = 0x55efe0f0e9b0, type = SBUS_CONNECTION, dbus = {server = 0x55efe0f1de10, 
    conn = 0x55efe0f1de10}, 
  address = 0x55efe0f1ef10 "unix:path=/var/lib/sss/pipes/private/sbus-dp_XXXXX.com", 
  connection_type = 2, disconnect = 0, managed_paths = 0x55efe0f1e3f0, 
  nodes_fns = 0x55efe0f1e620, incoming_signals = 0x55efe0f1e850, retries = 0, max_retries = 3, 
  reconnect_callback = 0x55efe0c21bf0 <nss_dp_reconnect_init>, reconnect_pvt = 0x55efe0f1d440, 
  symlink = 0x0, srv_init_fn = 0x0, srv_init_data = 0x0, clients = 0x55efe0f1ea80, 
  watch_list = 0x0, last_request_time = 0x0, client_destructor_data = 0x0}


(gdb) p *(watch->conn->watch_list)
Cannot access memory at address 0x0
```

sbus_connection::watch_list = 0x0  --  this assures me `watch_destructor()` had been called and `watch` removed from the list (making list empty) before `sbus_watch_handler()` was called (i.e. this is really use-after-free)


But now I am not so sure how this happens.

(In reply to Alexey Tikhonov from comment #5)
>  - watch is freed, but `watch->fde()` (tevent fd monitoring; set up in
> `sbus_add_watch()`) is not removed

  -- tevent fd should be removed alongside with `watch` because `watch` is parent context for tevent fd...

And even if `tevent_common_fd_destructor()` fails (for example because it was called from the context of `tevent_common_invoke_fd_handler()`) still `epoll_event_fd_destructor()` should call `epoll_update_event()` -> `epoll_del_event()` -> `epoll_ctl(EPOLL_CTL_DEL)` unconditionally...

But it seems that upstream PR930 doesn't make much sense because calling talloc_free(fde) from `watch_destructor()` is more or less the same what libtalloc should do on its own. So may be the order it is done in sssd-2.x - https://github.com/SSSD/sssd/commit/f1f9af528f71f42ac41bb7a272f4f7d940fd3a0f - makes some sense, but I do not have any explanation at the moment.

Comment 12 Alexey Tikhonov 2019-11-07 12:14:49 UTC

(1) Ok, it seems I figured out what is happening. Mechanics are different than I assumed initially.
```
#3  0x00007f5b3c93e527 in epoll_event_loop (tvalp=0x7ffe3ad15da0, epoll_ev=0x55efe0f0ec40) at ../tevent_epoll.c:736
        ...
        events = {{events = 29, data = {ptr = 0x55efe0f1ed70, ...}}}
```
events = 29 = 1D = 0x1 + 0x4 + 0x8 + 0x10 = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP

tevent_common_invoke_fd_handler() -> sbus_watch_handler() -> dbus_watch_handle(R) -> ...libdbus detects connection is closed... -> sbus_remove_watch() -> talloc_free(watch) ... get back to libdbus and back to sbus_watch_handler() -> "if (watch->dbus_write_watch) dbus_watch_handle(W)"  =>  *use-after-free*


(2) It seems there is additional issue: call to `talloc_free(watch)` in the context of `tevent_common_invoke_fd_handler()` will make d-tor of attached `tevent_fd` to fail:
https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L124
and
https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L52
thus leaking memory. Fixing this leak is tricky because libtevent requires to destroy tevent_fd before close(fd), which most probably will be executed by libdbus right after sbus_remove_watch()...
But I have to check details with maintainer/knowledgeable person of libtevent.

Comment 16 Alexey Tikhonov 2019-11-12 14:42:20 UTC

*** Bug 1731577 has been marked as a duplicate of this bug. ***

Comment 24 Alexey Tikhonov 2019-11-27 13:27:07 UTC

 * master
  * 337a1adf78beefb5dde93a733ab2caf86d65edb9
 * sssd-1-16
  * f845355e32127c5e8f2bf700cdaa5b8721804232

Comment 32 Abhijit Roy 2020-02-03 14:01:47 UTC

*** Bug 1797159 has been marked as a duplicate of this bug. ***

Comment 41 Amith 2020-05-27 22:55:46 UTC

Verified the bug on SSSD Version: sssd-1.16.5-7.el7.x86_64

The bug is verified Sanity Only and no related regression detected.

Here are the relevant sanity runs:
- https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/IdM-Pipelines/job/sssd-rhel-prod/job/RHEL7.9/12/allsanity_20HTML_20Report/
- https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/IdM-Pipelines/job/sssd-rhel-prod/job/RHEL7.9/12/testReport/test_journald/TestJournald/
- https://beaker.engineering.redhat.com/jobs/4282980
- https://beaker.engineering.redhat.com/jobs/4282982

Comment 43 errata-xmlrpc 2020-09-29 19:49:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3904

Note You need to log in before you can comment on or make changes to this bug.