Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 799971

Summary: sssd_be crashes on shutdown
Product: Red Hat Enterprise Linux 6 Reporter: Stephen Gallagher <sgallagh>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: apeetham, ashishks, grajaiya, jgalipea, prc
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.8.0-12.el6 Doc Type: Bug Fix
Doc Text:
No technical note required
Story Points: ---
Clone Of:
: 811912 (view as bug list) Environment:
Last Closed: 2012-06-20 11:55:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 811912    

Description Stephen Gallagher 2012-03-05 14:20:53 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/1226

We have a memory hierarchy bug that is visible in the IPA provider but may also be present but dormant in all of the providers.

When we receive a SIGTERM, we invoke our graceful exit handler which starts talloc_freeing() the toplevel contexts, including the be_ctx.

The problem is that the be_ctx has two branches of children: the provider-specific data and any sbus_connections that may be currently active.

The problem occurs when talloc decides that it will free the provider-specific data before the sbus_connections are freed. It is possible in some circumstances for one or more destructors within the sbus_connection memory branch to be attempting to access the provider-specific data. When this happens, talloc calls abort() due to the access-after-free (on older talloc versions, it erroneously reported this as a double-free).

The proposed approach will be to allocate be_req atop the provider-specific data instead of directly on the sbus_connection. We will then add a talloc_spy to the sbus_connection so that if it freed (such as if the connection is dropped) it will explicitly call talloc_free() on the pending be_req.

In this way, whichever path is followed first at shutdown will still have a guarantee that the provider-specific data remains available until all pending requests have been safely cancelled.

{{{
#0  0x00000032fa832885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = 2067
        selftid = 2067
#1  0x00000032fa834065 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7fff5ca91b30, sa_sigaction = 0x7fff5ca91b30}, sa_mask = {__val = {16, 21, 218993089241, 140734747974448, 218993085322, 140734747974768, 218993089483, 140733193388396, 218995357968,
              11, 13782216, 37, 218995358144, 13, 218980449216, 140734747974687}}, sa_flags = -401273743, sa_restorer = 0xffff}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00000032fc401ddc in talloc_abort (reason=0x32fc407bc0 "Bad talloc magic value - double free") at talloc.c:199
No locals.
#3  0x00000032fc401f18 in talloc_abort_double_free (ptr=<value optimized out>) at talloc.c:218
No locals.
#4  talloc_chunk_from_ptr (ptr=<value optimized out>) at talloc.c:239
        pp = <value optimized out>
        tc = <value optimized out>
#5  talloc_get_name (ptr=<value optimized out>) at talloc.c:937
        tc = 0x0
#6  0x00000032fc401f5e in talloc_check_name (ptr=0xa77330, name=0x7f3bd9f78131 "struct ipa_access_ctx") at talloc.c:956
        pname = <value optimized out>
#7  0x00007f3bd9f19e5c in hbac_sysdb_save (req=0x43ff3b10) at src/providers/ipa/ipa_access.c:427
        ret = <value optimized out>
        in_transaction = false
        hbac_ctx = 0xacc8e0
        domain = 0xa57900
        sysdb = <value optimized out>
        base_dn = <value optimized out>
        be_ctx = <value optimized out>
        access_ctx = <value optimized out>
        tmp_ctx = <value optimized out>
        __FUNCTION__ = "hbac_sysdb_save"
#8  0x0000003301c0447e in tevent_req_finish (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:133
No locals.
#9  _tevent_req_error (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:188
No locals.
#10 0x00007f3bd9f1f4ca in ipa_hbac_rule_info_done (subreq=<value optimized out>) at src/providers/ipa/ipa_hbac_rules.c:205
        ret = 5
        req = 0x43ff3b10
        state = <value optimized out>
        __FUNCTION__ = "ipa_hbac_rule_info_done"
#11 0x0000003301c0447e in tevent_req_finish (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:133
No locals.
#12 _tevent_req_error (req=<value optimized out>, error=<value optimized out>, location=<value optimized out>) at tevent_req.c:188
No locals.
#13 0x00007f3bd9f3521a in sdap_get_generic_done (op=<value optimized out>, reply=0x0, error=5, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:932
        req = 0xd6d4ee0
        state = 0xacc010
        attrs = <value optimized out>
        errmsg = 0x0
        result = <value optimized out>
        ret = <value optimized out>
        lret = <value optimized out>
        total_count = <value optimized out>
        cookie = {bv_len = 36738272, bv_val = 0x7f3bd9f8bb38 "src/providers/ldap/sdap_fd_events.c:57"}
        returned_controls = 0x0
        page_control = <value optimized out>
        __FUNCTION__ = "sdap_get_generic_done"
#14 0x00007f3bd9f36515 in sdap_handle_release (mem=<value optimized out>) at src/providers/ldap/sdap_async.c:117
        op = 0x2434390
#15 sdap_handle_destructor (mem=<value optimized out>) at src/providers/ldap/sdap_async.c:94
        sh = 0x23b8890
#16 0x00000032fc402d9e in _talloc_free_internal (ptr=0x23b8890, location=0x32fc407b1d "talloc.c:1893") at talloc.c:600
        d = 0x7f3bd9f364a0 <sdap_handle_destructor>
        tc = 0x7f3bd9f364a0
#17 0x00000032fc402c2b in _talloc_free_internal (ptr=0x23b5f50, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0x23b8890
        new_parent = 0x0
        tc = 0x23b8890
#18 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa75190, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0x23b5f50
        new_parent = 0x0
        tc = 0x23b5f50
#19 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa72ea0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa75190
        new_parent = 0x0
        tc = 0xa75190
#20 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa6e590, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa72ea0
        new_parent = 0x0
        tc = 0xa72ea0
#21 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa56ed0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa6e590
        new_parent = 0x0
        tc = 0xa6e590
#22 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa545f0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa56ed0
        new_parent = 0x0
        tc = 0xa56ed0
#23 0x00000032fc402c2b in _talloc_free_internal (ptr=0xa53480, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa545f0
        new_parent = 0x0
        tc = 0xa545f0
#24 0x00000032fc401abb in _talloc_free_internal (ptr=0xa532a0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:631
        child = 0xa53480
        new_parent = 0x0
#25 _talloc_free (ptr=0xa532a0, location=0x32fc407b1d "talloc.c:1893") at talloc.c:1133
        tc = 0xa53480
#26 0x00000032fa835d92 in __run_exit_handlers (status=0) at exit.c:78
        atfct = <value optimized out>
        onfct = <value optimized out>
        cxafct = <value optimized out>
        f = <value optimized out>
#27 exit (status=0) at exit.c:100
No locals.
#28 0x0000000000436267 in sig_term (sig=<value optimized out>) at src/util/server.c:232
        done_sigterm = 0
        __FUNCTION__ = "sig_term"
#29 0x00007f3bd9f68dfc in krb5_finalize (ev=<value optimized out>, se=<value optimized out>, signum=15, count=<value optimized out>, siginfo=<value optimized out>, private_data=<value optimized out>)
    at src/providers/krb5/krb5_common.c:652
        realm = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "krb5_finalize"
#30 0x0000003301c03aac in tevent_common_check_signal (ev=0xa53480) at tevent_signal.c:343
        ofs = 0
        j = <value optimized out>
        se = 0xa73c30
        count = 1
        sl = <value optimized out>
        next = 0xa54410
        counter = {count = <value optimized out>, seen = 0}
        clear_processed_siginfo = <value optimized out>
        i = <value optimized out>
#31 0x0000003301c052f7 in std_event_loop_once (ev=0xa53480, location=<value optimized out>) at tevent_standard.c:528
        std_ev = 0xa53540
        tval = {tv_sec = 0, tv_usec = 0}
#32 0x0000003301c026d0 in _tevent_loop_once (ev=0xa53480, location=0x4446b5 "src/util/server.c:526") at tevent.c:490
        ret = <value optimized out>
        nesting_stack_ptr = 0x0
#33 0x0000003301c0273b in tevent_common_loop_wait (ev=0xa53480, location=0x4446b5 "src/util/server.c:526") at tevent.c:591
        ret = <value optimized out>
#34 0x0000000000436111 in server_loop (main_ctx=0xa545f0) at src/util/server.c:526
No locals.
#35 0x000000000040eeab in main (argc=6, argv=<value optimized out>) at src/providers/data_provider_be.c:1333
        opt = <value optimized out>
        pc = <value optimized out>
        be_domain = 0xa52490 "no.ep.corp.local"
        srv_name = <value optimized out>
        conf_entry = <value optimized out>
        main_ctx = 0xa545f0
        ret = 0
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x64ae40, val = 0, descrip = 0x43b132 "Help options:", argDescrip = 0x0}, {longName = 0x43b140 "debug-level", shortName = 100 'd', argInfo = 2,
            arg = 0x64af20, val = 0, descrip = 0x43b111 "Debug level", argDescrip = 0x0}, {longName = 0x43b14c "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x64af24, val = 0,
            descrip = 0x43bda8 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x43b15b "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x64ae00, val = 0,
            descrip = 0x43b11d "Add debug timestamps", argDescrip = 0x0}, {longName = 0x43c720 "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fff5ca923f8, val = 0,
            descrip = 0x43bde0 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}}
        __FUNCTION__ = "main"
}}}

Comment 1 Stephen Gallagher 2012-03-05 14:22:28 UTC
We haven't been able to actually reproduce this crash, but careful analysis of the source revealed the most likely culprit to be an order-of-operations error at shutdown of the back-end process. We've fixed this hierarchy issue upstream.

We recommend verifying only that this introduces no regressions.

Comment 4 Stephen Gallagher 2012-04-10 16:50:49 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No technical note required

Comment 5 Amith 2012-05-29 15:51:16 UTC
Verified on sssd-1.8.0-31.el6.
This bug has been verified sanity only and no related regressions detected.

Comment 7 errata-xmlrpc 2012-06-20 11:55:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0747.html