Bug 799971

Summary: sssd_be crashes on shutdown
Product: Red Hat Enterprise Linux 6 Reporter: Stephen Gallagher <sgallagh>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: apeetham, ashishks, grajaiya, jgalipea, prc
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.8.0-12.el6 Doc Type: Bug Fix
Doc Text:
No technical note required
Story Points: ---
Clone Of:
: 811912 (view as bug list) Environment:
Last Closed: 2012-06-20 11:55:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 811912    

Description Stephen Gallagher 2012-03-05 14:20:53 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/1226

We have a memory hierarchy bug that is visible in the IPA provider but may also be present but dormant in all of the providers.

When we receive a SIGTERM, we invoke our graceful exit handler which starts talloc_freeing() the toplevel contexts, including the be_ctx.

The problem is that the be_ctx has two branches of children: the provider-specific data and any sbus_connections that may be currently active.

The problem occurs when talloc decides that it will free the provider-specific data before the sbus_connections are freed. It is possible in some circumstances for one or more destructors within the sbus_connection memory branch to be attempting to access the provider-specific data. When this happens, talloc calls abort() due to the access-after-free (on older talloc versions, it erroneously reported this as a double-free).

The proposed approach will be to allocate be_req atop the provider-specific data instead of directly on the sbus_connection. We will then add a talloc_spy to the sbus_connection so that if it freed (such as if the connection is dropped) it will explicitly call talloc_free() on the pending be_req.

In this way, whichever path is followed first at shutdown will still have a guarantee that the provider-specific data remains available until all pending requests have been safely cancelled.

{{{
#0  0x00000032fa832885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = 2067
        selftid = 2067
#1  0x00000032fa834065 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7fff5ca91b30, sa_sigaction = 0x7fff5ca91b30}, sa_mask = {__val = {16, 21, 218993089241, 140734747974448, 218993085322, 140734747974768, 218993089483, 140733193388396, 218995357968,
              11, 13782216, 37, 218995358144, 13, 218980449216, 140734747974687}}, sa_flags = -401273743, sa_restorer = 0xffff}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00000032fc401ddc in talloc_abort (reason=0x32fc407bc0 "Bad talloc magic value - double free") at talloc.c:199
No locals.
#3  0x00000032fc401f18 in talloc_abort_double_free (ptr=<value optimized out>) at talloc.c:218
No locals.
#4  talloc_chunk_from_ptr (ptr=<value optimized out>) at talloc.c:239
        pp = <value optimized out>
        tc = <value optimized out>
#5  talloc_get_name (ptr=<value optimized out>) at talloc.c:937
        tc = 0x0
#6  0x00000032fc401f5e in talloc_check_name (ptr=0xa77330, name=0x7f3bd9f78131 "struct ipa_access_ctx") at talloc.c:956
        pname = <value optimized out>
#7  0x00007f3bd9f19e5c in hbac_sysdb_save (req=0x43ff3b10) at src/providers/ipa/ipa_access.c:427
        ret = <value optimized out>
        in_transaction = false
        hbac_ctx = 0xacc8e0
        domain = 0xa57900
        sysdb = <value optimized out>
        base_dn = <value optimized out>
        be_ctx = <value optimized out>
        access_ctx = <value optimized out>
        tmp_ctx = <value optimized out>
        __FUNCTION__ = "hbac_sysdb_save"
#8  0x0000003301c0447e in tevent_req_finish (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:133
No locals.
#9  _tevent_req_error (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:188
No locals.
#10 0x00007f3bd9f1f4ca in ipa_hbac_rule_info_done (subreq=<value optimized out>) at src/providers/ipa/ipa_hbac_rules.c:205
        ret = 5
        req = 0x43ff3b10
        state = <value optimized out>
        __FUNCTION__ = "ipa_hbac_rule_info_done"
#11 0x0000003301c0447e in tevent_req_finish (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:133
No locals.
#12 _tevent_req_error (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:188
No locals.
#13 0x00007f3bd9f3521a in sdap_get_generic_done (op=<value optimized out>, reply=0x0, error=5, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:932
        req = 0xd6d4ee0
        state = 0xacc010
        attrs = <value optimized out>
        errmsg = 0x0
        result = <value optimized out>
        ret = <value optimized out>
        lret = <value optimized out>
        total_count = <value optimized out>
        cookie = {bv_len = 36738272, bv_val = 0x7f3bd9f8bb38 "src/providers/ldap/sdap_fd_events.c:57"}
        returned_controls = 0x0
        page_control = <value optimized out>
        __FUNCTION__ = "sdap_get_generic_done"
#14 0x00007f3bd9f36515 in sdap_handle_release (mem=<value optimized out>) at src/providers/ldap/sdap_async.c:117
        op = 0x2434390
#15 sdap_handle_destructor (mem=<value optimized out>) at src/providers/ldap/sdap_async.c:94
        sh = 0x23b8890
#16 0x00000032fc402d9e in _talloc_free_internal (ptr=0x23b8890, location=0x32fc407b1d "talloc.c:1893") at talloc.c:600
        d = 0x7f3bd9f364a0 <sdap_handle_destructor>
        tc = 0x7f3bd9f364a0
#17 0x00000032fc402c2b in _talloc_free_internal (ptr=0x23b5f50, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0x23b8890
        new_parent = 0x0
        tc = 0x23b8890
#18 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa75190, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0x23b5f50
        new_parent = 0x0
        tc = 0x23b5f50
#19 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa72ea0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa75190
        new_parent = 0x0
        tc = 0xa75190
#20 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa6e590, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa72ea0
        new_parent = 0x0
        tc = 0xa72ea0
#21 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa56ed0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa6e590
        new_parent = 0x0
        tc = 0xa6e590
#22 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa545f0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa56ed0
        new_parent = 0x0
        tc = 0xa56ed0
#23 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa53480, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa545f0
        new_parent = 0x0
        tc = 0xa545f0
#24 0x00000032fc401abb in _talloc_free_internal (ptr=0xa532a0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa53480
        new_parent = 0x0
#25 _talloc_free (ptr=0xa532a0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:1133
        tc = 0xa53480
#26 0x00000032fa835d92 in __run_exit_handlers (status=0) at exit.c:78
        atfct = <value optimized out>
        onfct = <value optimized out>
        cxafct = <value optimized out>
        f = <value optimized out>
#27 exit (status=0) at exit.c:100
No locals.
#28 0x0000000000436267 in sig_term (sig=<value optimized out>) at src/util/server.c:232
        done_sigterm = 0
        __FUNCTION__ = "sig_term"
#29 0x00007f3bd9f68dfc in krb5_finalize (ev=<value optimized out>, se=<value optimized out>, signum=15, count=<value optimized out>, siginfo=<value optimized out>, private_data=<value optimized out>)
    at src/providers/krb5/krb5_common.c:652
        realm = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "krb5_finalize"
#30 0x0000003301c03aac in tevent_common_check_signal (ev=0xa53480) at tevent_signal.c:343
        ofs = 0
        j = <value optimized out>
        se = 0xa73c30
        count = 1
        sl = <value optimized out>
        next = 0xa54410
        counter = {count = <value optimized out>, seen = 0}
        clear_processed_siginfo = <value optimized out>
        i = <value optimized out>
#31 0x0000003301c052f7 in std_event_loop_once (ev=0xa53480, location=<value optimized out>) at tevent_standard.c:528
        std_ev = 0xa53540
        tval = {tv_sec = 0, tv_usec = 0}
#32 0x0000003301c026d0 in _tevent_loop_once (ev=0xa53480, location=0x4446b5 "src/util/server.c:526") at tevent.c:490
        ret = <value optimized out>
        nesting_stack_ptr = 0x0
#33 0x0000003301c0273b in tevent_common_loop_wait (ev=0xa53480, location=0x4446b5 "src/util/server.c:526") at tevent.c:591
        ret = <value optimized out>
#34 0x0000000000436111 in server_loop (main_ctx=0xa545f0) at src/util/server.c:526
No locals.
#35 0x000000000040eeab in main (argc=6, argv=<value optimized out>) at src/providers/data_provider_be.c:1333
        opt = <value optimized out>
        pc = <value optimized out>
        be_domain = 0xa52490 "no.ep.corp.local"
        srv_name = <value optimized out>
        conf_entry = <value optimized out>
        main_ctx = 0xa545f0
        ret = 0
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x64ae40, val = 0, descrip = 0x43b132 "Help options:", argDescrip = 0x0}, {longName = 0x43b140 "debug-level", shortName = 100 'd', argInfo = 2,
            arg = 0x64af20, val = 0, descrip = 0x43b111 "Debug level", argDescrip = 0x0}, {longName = 0x43b14c "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x64af24, val = 0,
            descrip = 0x43bda8 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x43b15b "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x64ae00, val = 0,
            descrip = 0x43b11d "Add debug timestamps", argDescrip = 0x0}, {longName = 0x43c720 "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fff5ca923f8, val = 0,
            descrip = 0x43bde0 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}}
        __FUNCTION__ = "main"
}}}

Comment 1 Stephen Gallagher 2012-03-05 14:22:28 UTC
We haven't been able to actually reproduce this crash, but careful analysis of the source revealed the most likely culprit to be an order-of-operations error at shutdown of the back-end process. We've fixed this hierarchy issue upstream.

We recommend verifying only that this introduces no regressions.

Comment 4 Stephen Gallagher 2012-04-10 16:50:49 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No technical note required

Comment 5 Amith 2012-05-29 15:51:16 UTC
Verified on sssd-1.8.0-31.el6.
This bug has been verified sanity only and no related regressions detected.

Comment 7 errata-xmlrpc 2012-06-20 11:55:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0747.html