Bug 1767159
Summary: | SSSD is crashing: dbus_watch_handle() is invoked with corrupted 'watch' value | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Vinay Mishra <vmishra> | |
Component: | sssd | Assignee: | Alexey Tikhonov <atikhono> | |
Status: | CLOSED ERRATA | QA Contact: | sssd-qe <sssd-qe> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.7 | CC: | abroy, apeetham, atikhono, dominik.mierzejewski, grajaiya, jhrozek, lslebodn, mkosek, mzidek, pbrezina, sgoveas, thalman, tscherf | |
Target Milestone: | rc | Keywords: | Triaged, ZStream | |
Target Release: | 7.9 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | sync-to-jira | |||
Fixed In Version: | sssd-1.16.5-1.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1816031 (view as bug list) | Environment: | ||
Last Closed: | 2020-09-29 19:49:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1816031 |
Comment 3
Alexey Tikhonov
2019-11-04 15:48:19 UTC
(In reply to Alexey Tikhonov from comment #3) > I was not able to find any similar issues with other apps using dbus. I mean I was not able to find "dbus_watch_handle (watch=0xa0)" with other apps. Of course, there are a lot of generic SIGSEGV in dbus_watch_handle() can be found. My guess is following is happening: - libdbus removes dbush watch calling `sbus_remove_watch()` callback - watch is freed, but `watch->fde()` (tevent fd monitoring; set up in `sbus_add_watch()`) is not removed ( https://tevent.samba.org/group__tevent.html#gadc52787f3daf49e589066d37a5cdb18c : "To cancel the monitoring of a file descriptor, call talloc_free() on the object returned by this function.") - tevent calls `sbus_watch_handler()` to report fd became writable, but watch context is already freed => use-after-free Indirect confirmation is that sssd 2-x has this fixed: https://github.com/SSSD/sssd/blob/master/src/sbus/connection/sbus_watch.c#L283 (though I would prefer to put this into watch destructor) I need to think about it yet. (In reply to Alexey Tikhonov from comment #5) > My guess is following is happening: > > - libdbus removes dbush watch calling `sbus_remove_watch()` callback > - watch is freed, but `watch->fde()` (tevent fd monitoring; set up in > `sbus_add_watch()`) is not removed > - tevent calls `sbus_watch_handler()` to report fd became writable, but > watch context is already freed => use-after-free > > I need to think about it yet. I can't be 100% sure described bug is root cause of this bz (IMO most probably), but I am pretty sure this is a bug. I will reuse (reopen) existing upstream ticket https://pagure.io/SSSD/sssd/issue/2660 to track fixing it. ``` (gdb) frame 1 #1 0x00007f5b40066398 in sbus_watch_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, data=<optimized out>) at src/sbus/sssd_dbus_common.c:94 94 dbus_watch_handle(watch->dbus_write_watch, DBUS_WATCH_WRITABLE); (gdb) p *watch $1 = {prev = 0x0, next = 0x0, conn = 0x55efe0f1e2f0, fde = 0x55efe0f1ed70, fd = 12, dbus_read_watch = 0x0, dbus_write_watch = 0xa0} (gdb) p *(watch->conn) $2 = {ev = 0x55efe0f0e9b0, type = SBUS_CONNECTION, dbus = {server = 0x55efe0f1de10, conn = 0x55efe0f1de10}, address = 0x55efe0f1ef10 "unix:path=/var/lib/sss/pipes/private/sbus-dp_XXXXX.com", connection_type = 2, disconnect = 0, managed_paths = 0x55efe0f1e3f0, nodes_fns = 0x55efe0f1e620, incoming_signals = 0x55efe0f1e850, retries = 0, max_retries = 3, reconnect_callback = 0x55efe0c21bf0 <nss_dp_reconnect_init>, reconnect_pvt = 0x55efe0f1d440, symlink = 0x0, srv_init_fn = 0x0, srv_init_data = 0x0, clients = 0x55efe0f1ea80, watch_list = 0x0, last_request_time = 0x0, client_destructor_data = 0x0} (gdb) p *(watch->conn->watch_list) Cannot access memory at address 0x0 ``` sbus_connection::watch_list = 0x0 -- this assures me `watch_destructor()` had been called and `watch` removed from the list (making list empty) before `sbus_watch_handler()` was called (i.e. this is really use-after-free) But now I am not so sure how this happens. (In reply to Alexey Tikhonov from comment #5) > - watch is freed, but `watch->fde()` (tevent fd monitoring; set up in > `sbus_add_watch()`) is not removed -- tevent fd should be removed alongside with `watch` because `watch` is parent context for tevent fd... And even if `tevent_common_fd_destructor()` fails (for example because it was called from the context of `tevent_common_invoke_fd_handler()`) still `epoll_event_fd_destructor()` should call `epoll_update_event()` -> `epoll_del_event()` -> `epoll_ctl(EPOLL_CTL_DEL)` unconditionally... But it seems that upstream PR930 doesn't make much sense because calling talloc_free(fde) from `watch_destructor()` is more or less the same what libtalloc should do on its own. So may be the order it is done in sssd-2.x - https://github.com/SSSD/sssd/commit/f1f9af528f71f42ac41bb7a272f4f7d940fd3a0f - makes some sense, but I do not have any explanation at the moment. (1) Ok, it seems I figured out what is happening. Mechanics are different than I assumed initially. ``` #3 0x00007f5b3c93e527 in epoll_event_loop (tvalp=0x7ffe3ad15da0, epoll_ev=0x55efe0f0ec40) at ../tevent_epoll.c:736 ... events = {{events = 29, data = {ptr = 0x55efe0f1ed70, ...}}} ``` events = 29 = 1D = 0x1 + 0x4 + 0x8 + 0x10 = EPOLLIN | EPOLLOUT | EPOLLERR | EPOLLHUP tevent_common_invoke_fd_handler() -> sbus_watch_handler() -> dbus_watch_handle(R) -> ...libdbus detects connection is closed... -> sbus_remove_watch() -> talloc_free(watch) ... get back to libdbus and back to sbus_watch_handler() -> "if (watch->dbus_write_watch) dbus_watch_handle(W)" => *use-after-free* (2) It seems there is additional issue: call to `talloc_free(watch)` in the context of `tevent_common_invoke_fd_handler()` will make d-tor of attached `tevent_fd` to fail: https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L124 and https://gitlab.com/samba-team/samba/blob/master/lib/tevent/tevent_fd.c#L52 thus leaking memory. Fixing this leak is tricky because libtevent requires to destroy tevent_fd before close(fd), which most probably will be executed by libdbus right after sbus_remove_watch()... But I have to check details with maintainer/knowledgeable person of libtevent. *** Bug 1731577 has been marked as a duplicate of this bug. *** * master * 337a1adf78beefb5dde93a733ab2caf86d65edb9 * sssd-1-16 * f845355e32127c5e8f2bf700cdaa5b8721804232 *** Bug 1797159 has been marked as a duplicate of this bug. *** Verified the bug on SSSD Version: sssd-1.16.5-7.el7.x86_64 The bug is verified Sanity Only and no related regression detected. Here are the relevant sanity runs: - https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/IdM-Pipelines/job/sssd-rhel-prod/job/RHEL7.9/12/allsanity_20HTML_20Report/ - https://idm-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/IdM-Pipelines/job/sssd-rhel-prod/job/RHEL7.9/12/testReport/test_journald/TestJournald/ - https://beaker.engineering.redhat.com/jobs/4282980 - https://beaker.engineering.redhat.com/jobs/4282982 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3904 |