Bug 1517383

Summary: ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired
Product: Red Hat Enterprise Linux 7 Reporter: Viktor Ashirov <vashirov>
Component: 389-ds-baseAssignee: thierry bordaz <tbordaz>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: high Docs Contact: Marc Muehlfeld <mmuehlfe>
Priority: high    
Version: 7.5CC: enewland, gparente, mreynolds, nkinder, rmeggins, tbordaz, tmihinto
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.7.5-12 Doc Type: Known Issue
Doc Text:
Directory Server can terminate unexpectedly during shutdown Directory Server uses the *nunc-stans* framework to manage connection events. If a connection is closed when shutting down the server, a *nunc-stans* job can access a freed connection structure. As a consequence, Directory Server can terminate unexpectedly. Because this situation occurs in a late state of the shutdown process, data is not corrupted or lost. Currently, no workaround is available.
Story Points: ---
Clone Of:
: 1567042 (view as bug list) Environment:
Last Closed: 2018-04-10 14:22:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1567042    
Attachments:
Description Flags
stacktrace.txt
none
coredump.bz2 none

Description Viktor Ashirov 2017-11-24 18:48:52 UTC
Created attachment 1358745 [details]
stacktrace.txt

Description of problem:
Crash has been occurred during the acceptance tests at server shutdown.

Version-Release number of selected component (if applicable):
389-ds-base-1.3.7.5-10.el7.x86_64

Stacktrace and coredump are attached.

Error logs:
Nov 23 21:37:37 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:37.630902090 -0500] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 39076 for LDAP requests
Nov 23 21:37:40 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:40.204250822 -0500] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.390167823 -0500] - INFO - op_thread_cleanup - slapd shutting down - signaling operation threads - op stack size 1 max work q size 1 max work q stack size 1
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.392253179 -0500] - INFO - slapd_daemon - slapd shutting down - closing down internal subsystems and plugins
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.396564343 -0500] - INFO - dblayer_pre_close - Waiting for 4 database threads to stop
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.584486955 -0500] - INFO - dblayer_pre_close - All database threads now stopped
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.593159090 -0500] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594238459 -0500] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 1 work q stack objects - freed 1 op stack objects
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594979876 -0500] - INFO - main - slapd stopped.
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.596373869 -0500] - ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired
Nov 23 21:37:42 server.example.com kernel: ns-slapd[30647]: segfault at e0 ip 000055da1afd589a sp 00007f3fff66fb90 error 6 in ns-slapd[55da1afba000+54000]
Nov 23 21:37:42 server.example.com abrt-hook-ccpp[32240]: Process 30629 (ns-slapd) of user 1001 killed by SIGSEGV - dumping core

Comment 2 Viktor Ashirov 2017-11-24 18:52:09 UTC
Created attachment 1358746 [details]
coredump.bz2

Comment 3 Viktor Ashirov 2017-11-27 18:25:55 UTC
=================================================================
==5707== ERROR: AddressSanitizer: heap-use-after-free on address 0x7ff71dc2abc8 at pc 0x55878c6cacc5 bp 0x7ff743788af0 sp 0x7ff743788ae0
READ of size 8 at 0x7ff71dc2abc8 thread T25
    #0 0x55878c6cacc4 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1642
    #1 0x7ff763c06cc0 in ns_job_done /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:291
    #2 0x7ff763c09808 in event_cb /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:118
    #3 0x7ff760e65a13 in event_base_loop /usr/src/debug/libevent-2.0.21-stable/event.c:1350
    #4 0x7ff763c099dd in ns_event_fw_loop /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:308
    #5 0x7ff763c06724 in event_loop_thread_func /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:581
    #6 0x7ff763e29867 in _ZN6__asan10AsanThread11ThreadStartEv _asan_rtl_
    #7 0x7ff7612d4dd4 in start_thread /usr/src/debug/glibc-2.17-c758a686/nptl/pthread_create.c:308
    #8 0x7ff7609829bc in __clone /usr/src/debug////////glibc-2.17-c758a686/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113
0x7ff71dc2abc8 is located 25544 bytes inside of 376832-byte region [0x7ff71dc24800,0x7ff71dc80800)
freed by thread T0 here:
    #0 0x7ff763e25dd9 in __interceptor_free _asan_rtl_
    #1 0x7ff7637946c8 in slapi_ch_free /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:270
    #2 0x55878c6c4c0c in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:75
    #3 0x55878c6ceb56 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1260
    #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187
    #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
previously allocated by thread T0 here:
    #0 0x7ff763e25ff5 in calloc _asan_rtl_
    #1 0x7ff763794288 in slapi_ch_calloc /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:180
    #2 0x55878c6c47a8 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:27
    #3 0x55878c6cd72e in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:918
    #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187
    #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
Thread T25 created by T0 here:
    #0 0x7ff763e1aa0a in __interceptor_pthread_create _asan_rtl_
    #1 0x7ff763c08d6c in ns_thrpool_new /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:1466
    #2 0x55878c6ad69a in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:233
    #3 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
Shadow bytes around the buggy address:
  0x0fff63b7d520: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d530: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d540: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d550: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d560: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0fff63b7d570: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
  0x0fff63b7d580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==5707== ABORTING

Comment 4 thierry bordaz 2017-11-28 08:29:52 UTC
upstream ticket https://pagure.io/389-ds-base/issue/49450

Comment 5 wibrown@redhat.com 2017-11-28 08:52:58 UTC
https://pagure.io/389-ds-base/issue/48184 This is actually the cause of the asan fault.

What is happening is that during shutdown we call in daemon.c:

1260     connection_table_free(the_connection_table);

This frees the ct, however, nunc-stans *is still running*. Because NS is still running, and potentially has outstaining IO's remaining, these events *can and will fire*.

This triggers an io event, and during this event we try to access the connection: of course, triggering the heap-use-after free in daemon.c:1642

The correct solution is that before we free the ct, we need to loop over it's content and call ns_job_done on *all* remaining connections to shutdown correctly.

I am developing a patch now,

Comment 6 wibrown@redhat.com 2017-11-28 08:53:29 UTC
Please note: While this asan fault is indeed valid, it may not be the root cause of the described issue.

Comment 11 Viktor Ashirov 2018-01-30 12:52:29 UTC
Build tested: 
389-ds-base-1.3.7.5-14.el7.x86_64 (rebuilt with ASAN)

This crash didn't occur in acceptance tests, marking as VERIFIED, SanityOnly.

Comment 12 thierry bordaz 2018-02-05 10:26:16 UTC
Note, we have a crash (in 389-ds-base-1.3.7.5-14.el7.x86_64) that could be related to this BZ.

see https://bugzilla.redhat.com/show_bug.cgi?id=1538994#c11

It differs from this https://bugzilla.redhat.com/show_bug.cgi?id=1517383 because it looks a heap corruption.

It is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1517383 as it occurred at shutdown

Comment 21 errata-xmlrpc 2018-04-10 14:22:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0811