Bug 1538994

Summary: [abrt] [faf] 389-ds-base: tc_calloc(): /usr/sbin/ns-slapd killed by 11
Product: Red Hat Enterprise Linux 8 Reporter: Viktor Ashirov <vashirov>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED WONTFIX QA Contact: RHDS QE <ds-qe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: gparente, lkrispen, lslebodn, nkinder, rmeggins, spichugi, tbordaz, vashirov
Target Milestone: pre-dev-freeze   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/5b7ed1bc20ce308e93fe9ab96e4d90bca09f6832/
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-15 00:30:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Viktor Ashirov 2018-01-26 10:45:08 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/5b7ed1bc20ce308e93fe9ab96e4d90bca09f6832/

Comment 1 thierry bordaz 2018-01-26 10:51:56 UTC
backtrace is 

(gdb) where
#0  0x00007f84f621d0df in tc_calloc () at /lib64/libtcmalloc.so.4
#1  0x00007f84f73cafe4 in slapi_ch_calloc (nelem=1, size=232) at ldap/servers/slapd/ch_malloc.c:180
#2  0x00007f84f76baece in new_ns_job (tp=tp@entry=0x5559930c8cc0, fd=fd@entry=0x555994eed980, job_type=job_type@entry=276, func=func@entry=0x555991a5bfc0 <ns_handle_pr_read_ready>, data=data@entry=0x555993b9b290) at src/nunc-stans/ns/ns_thrpool.c:695
#3  0x00007f84f76bb1c8 in alloc_io_context (data=0x555993b9b290, func=0x555991a5bfc0 <ns_handle_pr_read_ready>, job_type=276, fd=0x555994eed980, tp=0x5559930c8cc0) at src/nunc-stans/ns/ns_thrpool.c:726
#4  0x00007f84f76bb1c8 in ns_add_io_timeout_job (tp=0x5559930c8cc0, fd=0x555994eed980, tv=tv@entry=0x7f84de543b80, job_type=job_type@entry=260, func=func@entry=0x555991a5bfc0 <ns_handle_pr_read_ready>, data=data@entry=0x555993b9b290, job=job@entry=0x555993b9b3f0)
    at src/nunc-stans/ns/ns_thrpool.c:954
#5  0x0000555991a5bdf6 in ns_connection_post_io_or_closing (conn=0x555993b9b290) at ldap/servers/slapd/daemon.c:1751
#6  0x0000555991a5d230 in ns_handle_new_connection (job=0x555993b49a40) at ldap/servers/slapd/daemon.c:2470
#7  0x00007f84f76bac89 in work_job_execute (job=0x555993b49a40) at src/nunc-stans/ns/ns_thrpool.c:291
#8  0x00007f84f76bbbe3 in event_cb (fd=<optimized out>, event=<optimized out>, arg=<optimized out>)
    at src/nunc-stans/ns/ns_event_fw_event.c:118
#9  0x00007f84f4710a14 in event_base_loop () at /lib64/libevent-2.0.so.5
#10 0x00007f84f76bbeae in ns_event_fw_loop (ns_event_fw_ctx=<optimized out>) at src/nunc-stans/ns/ns_event_fw_event.c:308
#11 0x00007f84f76baac9 in event_loop_thread_func (arg=0x5559930c8cc0) at src/nunc-stans/ns/ns_thrpool.c:581
#12 0x00007f84f4b7fdd5 in start_thread () at /lib64/libpthread.so.0
#13 0x00007f84f422daed in clone () at /lib64/libc.so.6


Something specific to this crash (compare to similar one https://bugzilla.redhat.com/show_bug.cgi?id=1517383) is that it is in the middle of ns_handle_new_connection

Retesting with ASAN would help because it looks a heap corruption

Comment 2 thierry bordaz 2018-01-26 16:55:51 UTC
According to the latest operation in the core. The crash occurred at just after 25/Jan/2018:17:02:35.488798428.
The error logs are showing others operation after the crash that are likely due to ipa installer that was continuing

Comment 3 thierry bordaz 2018-01-26 16:59:03 UTC
Last operations just before the crash were:
...
[25/Jan/2018:17:02:35.153276363 -0500] conn=63 fd=70 slot=70 connection from 10.16.46.53 to 10.16.46.53
[25/Jan/2018:17:02:35.161447480 -0500] conn=63 op=0 BIND dn="cn=Directory Manager" method=128 version=2
[25/Jan/2018:17:02:35.161534444 -0500] conn=63 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0008006553 dn="cn=directory manager"
[25/Jan/2018:17:02:35.163161348 -0500] conn=64 fd=74 slot=74 connection from 10.16.46.53 to 10.16.46.53
[25/Jan/2018:17:02:35.165369347 -0500] conn=64 op=0 BIND dn="cn=Directory Manager" method=128 version=3
[25/Jan/2018:17:02:35.165450276 -0500] conn=64 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0000249121 dn="cn=directory manager"
[25/Jan/2018:17:02:35.166722817 -0500] conn=64 op=1 SRCH base="ou=certificateProfiles,ou=ca,o=ipaca" scope=0 filter="(objectClass=*)" attrs=ALL
[25/Jan/2018:17:02:35.166862624 -0500] conn=64 op=1 RESULT err=0 tag=101 nentries=1 etime=0.0000212450
[25/Jan/2018:17:02:35.167547072 -0500] conn=64 op=2 SRCH base="ou=certificateProfiles,ou=ca,o=ipaca" scope=2 filter="(objectClass=*)" attrs="* entryusn nsUniqueId numSubordinates aci" options=persistent
[25/Jan/2018:17:02:35.478741848 -0500] conn=63 op=1 UNBIND
[25/Jan/2018:17:02:35.478774334 -0500] conn=63 op=1 fd=70 closed - U1
[25/Jan/2018:17:02:35.479422386 -0500] conn=65 fd=70 slot=70 connection from 10.16.46.53 to 10.16.46.53
[25/Jan/2018:17:02:35.479804147 -0500] conn=65 op=0 BIND dn="cn=Directory Manager" method=128 version=2
[25/Jan/2018:17:02:35.479893600 -0500] conn=65 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0000226436 dn="cn=directory manager"
[25/Jan/2018:17:02:35.480454340 -0500] conn=66 fd=75 slot=75 connection from 10.16.46.53 to 10.16.46.53
[25/Jan/2018:17:02:35.480862636 -0500] conn=66 op=0 BIND dn="cn=Directory Manager" method=128 version=3
[25/Jan/2018:17:02:35.480917975 -0500] conn=66 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0000236591 dn="cn=directory manager"
[25/Jan/2018:17:02:35.487832755 -0500] conn=67 fd=76 slot=76 connection from 10.16.46.53 to 10.16.46.53
[25/Jan/2018:17:02:35.488300505 -0500] conn=68 fd=77 slot=77 connection from 10.16.46.53 to 10.16.46.53
[25/Jan/2018:17:02:35.488798428 -0500] conn=69 fd=78 slot=78 connection from 10.16.46.53 to 10.16.46.53

Comment 17 Lukas Slebodnik 2018-02-19 15:26:39 UTC
Another crash:
(gdb) bt
#0  0x00007ff839a6b29b in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () at /lib64/libtcmalloc.so.4
#1  0x00007ff839a6b68c in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () at /lib64/libtcmalloc.so.4
#2  0x00007ff83ac283a1 in slapi_ch_free (ptr=ptr@entry=0x7ffeaaf3ad08) at ldap/servers/slapd/ch_malloc.c:270
#3  0x0000555b65a7ad63 in nunc_stans_free (ptr=0x555b73093860) at ldap/servers/slapd/main.c:176
#4  0x00007ff83af17bff in ns_job_done (job=0x555b73093860) at src/nunc-stans/ns/ns_thrpool.c:802
#5  0x0000555b65a74cd7 in ns_connection_post_io_or_closing (conn=conn@entry=0x555b7186bd28) at ldap/servers/slapd/daemon.c:1693
#6  0x0000555b65a70b04 in disconnect_server_nomutex_ext (conn=conn@entry=0x555b7186bd28, opconnid=<optimized out>, opid=opid@entry=-1, reason=reason@entry=-5000, error=error@entry=125, schedule_closure_job=schedule_closure_job@entry=1) at ldap/servers/slapd/connection.c:2301
#7  0x0000555b65a713cb in disconnect_server_nomutex (conn=conn@entry=0x555b7186bd28, opconnid=<optimized out>, opid=opid@entry=-1, reason=reason@entry=-5000, error=error@entry=125)
    at ldap/servers/slapd/connection.c:2314
#8  0x0000555b65a727a4 in connection_table_disconnect_all (ct=0x555b6c720e00) at ldap/servers/slapd/conntable.c:101
#9  0x0000555b65a76cfe in slapd_daemon (ports=ports@entry=0x7ffeaaf3b0f0, tp=tp@entry=0x555b67526300) at ldap/servers/slapd/daemon.c:1190
#10 0x0000555b65a6880f in main (argc=<optimized out>, argv=0x7ffeaaf3b568) at ldap/servers/slapd/main.c:1200

Comment 20 Viktor Ashirov 2018-02-20 16:37:23 UTC
Looking at the dates of reports and versions of 389-ds-base, it looks like the problem started to appear in build 389-ds-base-1.3.7.5-13.el7 and higher. It might be another regression introduced by the coverity fixes, that this version contains.

Comment 21 Viktor Ashirov 2018-02-20 16:41:43 UTC
Further looking at the code, it seems that this assertion was added in 1418fc327245a9b214f107ae3ba2e192b48d16d3, upstream ticket: https://pagure.io/389-ds-base/issue/48184, bz1517383. Thierry already pointed out that it might be related.

Comment 25 Lukas Slebodnik 2018-02-28 11:34:30 UTC
Another crash:
(gdb) bt full
#0  0x00007fee88ef929b in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () at /lib64/libtcmalloc.so.4
#1  0x00007fee88ef968c in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () at /lib64/libtcmalloc.so.4
#2  0x00007fee8a0b63a1 in slapi_ch_free (ptr=ptr@entry=0x7ffcbdc2be18) at ldap/servers/slapd/ch_malloc.c:270
#3  0x000055aa9c13dd63 in nunc_stans_free (ptr=0x55aaaaa87590) at ldap/servers/slapd/main.c:176
#4  0x00007fee8a3a5bff in ns_job_done (job=0x55aaaaa87590) at src/nunc-stans/ns/ns_thrpool.c:802
        shutdown_state = 1
#5  0x000055aa9c137cd7 in ns_connection_post_io_or_closing (conn=conn@entry=0x55aaa8572480) at ldap/servers/slapd/daemon.c:1693
        tv = {tv_sec = 94191457084544, tv_usec = 112}
#6  0x000055aa9c133b04 in disconnect_server_nomutex_ext (conn=conn@entry=0x55aaa8572480, opconnid=<optimized out>, opid=opid@entry=-1, reason=reason@entry=-5000, error=error@entry=125, schedule_closure_job=schedule_closure_job@entry=1) at ldap/servers/slapd/connection.c:2301
#7  0x000055aa9c1343cb in disconnect_server_nomutex (conn=conn@entry=0x55aaa8572480, opconnid=<optimized out>, opid=opid@entry=-1, reason=reason@entry=-5000, error=error@entry=125)
    at ldap/servers/slapd/connection.c:2314
#8  0x000055aa9c1357a4 in connection_table_disconnect_all (ct=0x55aaa7820e00) at ldap/servers/slapd/conntable.c:101
        c = 0x55aaa8572480
        i = 112
#9  0x000055aa9c139cfe in slapd_daemon (ports=ports@entry=0x7ffcbdc2c200, tp=tp@entry=0x55aa9dbf0300) at ldap/servers/slapd/daemon.c:1190
        n_tcps = 0x0
        s_tcps = 0x0
        i_unix = 0x0
        fdesp = <optimized out>
        num_poll = 3
        pr_timeout = 250
        threads = <optimized out>
        in_referral_mode = 0
#10 0x000055aa9c12b80f in main (argc=<optimized out>, argv=0x7ffcbdc2c678) at ldap/servers/slapd/main.c:1200
        return_value = <optimized out>
        mcfg = 
          {extraname = 0x0, slapd_exemode = 1, n_port = 389, i_port = 1, s_port = 636, myname = 0x55aa9da62060 "ns-slapd", ldif_file = 0x0, ldif_files = 0, cmd_line_instance_name = 0x0, cmd_line_instance_names = 0x0, skip_db_protect_check = 0, db2ldif_include = 0x0, db2ldif_exclude = 0x0, ldif2db_removedupvals = 1, ldif2db_noattrindexes = 0, db2index_attrs = 0x0, ldif_printkey = 5, archive_name = 0x0, db2ldif_dump_replica = 0, db2ldif_dump_uniqueid = 1, ldif2db_generate_uniqueid = 1, ldif2db_namespaceid = 0x0, importexport_encrypt = 0, upgradedb_flags = 0, upgradednformat_dryrun = 0, is_quiet = 0, dbverify_verbose = 0, dbverify_dbdir = 0x0}
        slapdFrontendConfig = <optimized out>
        ports_info = 
          {n_port = 389, s_port = 636, n_listenaddr = 0x0, s_listenaddr = 0x0, n_socket = 0x55aa9dc300d0, i_listenaddr = 0x0, i_port = 1, i_socket = 0x55aa9de5bfe0, s_socket = 0x55aa9dc300b0}
        tp = 0x55aa9dbf0300
        m = <optimized out>
        notify = <optimized out>
(gdb)

Comment 41 mreynolds 2019-11-15 00:30:10 UTC
Since this crash occurs in nunc-stans, and nunc-stans has been abandoned, I am going to close this bug as won't fix.