Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1517383 - ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired
ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Att...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base (Show other bugs)
7.5
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: thierry bordaz
Viktor Ashirov
Marc Muehlfeld
:
Depends On:
Blocks: 1567042
  Show dependency treegraph
 
Reported: 2017-11-24 13:48 EST by Viktor Ashirov
Modified: 2018-04-25 06:34 EDT (History)
7 users (show)

See Also:
Fixed In Version: 389-ds-base-1.3.7.5-12
Doc Type: Known Issue
Doc Text:
Directory Server can terminate unexpectedly during shutdown Directory Server uses the *nunc-stans* framework to manage connection events. If a connection is closed when shutting down the server, a *nunc-stans* job can access a freed connection structure. As a consequence, Directory Server can terminate unexpectedly. Because this situation occurs in a late state of the shutdown process, data is not corrupted or lost. Currently, no workaround is available.
Story Points: ---
Clone Of:
: 1567042 (view as bug list)
Environment:
Last Closed: 2018-04-10 10:22:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
stacktrace.txt (6.65 KB, text/plain)
2017-11-24 13:48 EST, Viktor Ashirov
no flags Details
coredump.bz2 (1.04 MB, application/x-bzip)
2017-11-24 13:52 EST, Viktor Ashirov
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0811 None None None 2018-04-10 10:23 EDT

  None (edit)
Description Viktor Ashirov 2017-11-24 13:48:52 EST
Created attachment 1358745 [details]
stacktrace.txt

Description of problem:
Crash has been occurred during the acceptance tests at server shutdown.

Version-Release number of selected component (if applicable):
389-ds-base-1.3.7.5-10.el7.x86_64

Stacktrace and coredump are attached.

Error logs:
Nov 23 21:37:37 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:37.630902090 -0500] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 39076 for LDAP requests
Nov 23 21:37:40 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:40.204250822 -0500] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.390167823 -0500] - INFO - op_thread_cleanup - slapd shutting down - signaling operation threads - op stack size 1 max work q size 1 max work q stack size 1
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.392253179 -0500] - INFO - slapd_daemon - slapd shutting down - closing down internal subsystems and plugins
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.396564343 -0500] - INFO - dblayer_pre_close - Waiting for 4 database threads to stop
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.584486955 -0500] - INFO - dblayer_pre_close - All database threads now stopped
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.593159090 -0500] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594238459 -0500] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 1 work q stack objects - freed 1 op stack objects
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594979876 -0500] - INFO - main - slapd stopped.
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.596373869 -0500] - ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired
Nov 23 21:37:42 server.example.com kernel: ns-slapd[30647]: segfault at e0 ip 000055da1afd589a sp 00007f3fff66fb90 error 6 in ns-slapd[55da1afba000+54000]
Nov 23 21:37:42 server.example.com abrt-hook-ccpp[32240]: Process 30629 (ns-slapd) of user 1001 killed by SIGSEGV - dumping core
Comment 2 Viktor Ashirov 2017-11-24 13:52 EST
Created attachment 1358746 [details]
coredump.bz2
Comment 3 Viktor Ashirov 2017-11-27 13:25:55 EST
=================================================================
==5707== ERROR: AddressSanitizer: heap-use-after-free on address 0x7ff71dc2abc8 at pc 0x55878c6cacc5 bp 0x7ff743788af0 sp 0x7ff743788ae0
READ of size 8 at 0x7ff71dc2abc8 thread T25
    #0 0x55878c6cacc4 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1642
    #1 0x7ff763c06cc0 in ns_job_done /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:291
    #2 0x7ff763c09808 in event_cb /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:118
    #3 0x7ff760e65a13 in event_base_loop /usr/src/debug/libevent-2.0.21-stable/event.c:1350
    #4 0x7ff763c099dd in ns_event_fw_loop /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:308
    #5 0x7ff763c06724 in event_loop_thread_func /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:581
    #6 0x7ff763e29867 in _ZN6__asan10AsanThread11ThreadStartEv _asan_rtl_
    #7 0x7ff7612d4dd4 in start_thread /usr/src/debug/glibc-2.17-c758a686/nptl/pthread_create.c:308
    #8 0x7ff7609829bc in __clone /usr/src/debug////////glibc-2.17-c758a686/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113
0x7ff71dc2abc8 is located 25544 bytes inside of 376832-byte region [0x7ff71dc24800,0x7ff71dc80800)
freed by thread T0 here:
    #0 0x7ff763e25dd9 in __interceptor_free _asan_rtl_
    #1 0x7ff7637946c8 in slapi_ch_free /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:270
    #2 0x55878c6c4c0c in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:75
    #3 0x55878c6ceb56 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1260
    #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187
    #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
previously allocated by thread T0 here:
    #0 0x7ff763e25ff5 in calloc _asan_rtl_
    #1 0x7ff763794288 in slapi_ch_calloc /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:180
    #2 0x55878c6c47a8 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:27
    #3 0x55878c6cd72e in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:918
    #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187
    #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
Thread T25 created by T0 here:
    #0 0x7ff763e1aa0a in __interceptor_pthread_create _asan_rtl_
    #1 0x7ff763c08d6c in ns_thrpool_new /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:1466
    #2 0x55878c6ad69a in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:233
    #3 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
Shadow bytes around the buggy address:
  0x0fff63b7d520: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d530: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d540: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d550: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d560: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0fff63b7d570: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
  0x0fff63b7d580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==5707== ABORTING
Comment 4 thierry bordaz 2017-11-28 03:29:52 EST
upstream ticket https://pagure.io/389-ds-base/issue/49450
Comment 5 wibrown@redhat.com 2017-11-28 03:52:58 EST
https://pagure.io/389-ds-base/issue/48184 This is actually the cause of the asan fault.

What is happening is that during shutdown we call in daemon.c:

1260     connection_table_free(the_connection_table);

This frees the ct, however, nunc-stans *is still running*. Because NS is still running, and potentially has outstaining IO's remaining, these events *can and will fire*.

This triggers an io event, and during this event we try to access the connection: of course, triggering the heap-use-after free in daemon.c:1642

The correct solution is that before we free the ct, we need to loop over it's content and call ns_job_done on *all* remaining connections to shutdown correctly.

I am developing a patch now,
Comment 6 wibrown@redhat.com 2017-11-28 03:53:29 EST
Please note: While this asan fault is indeed valid, it may not be the root cause of the described issue.
Comment 11 Viktor Ashirov 2018-01-30 07:52:29 EST
Build tested: 
389-ds-base-1.3.7.5-14.el7.x86_64 (rebuilt with ASAN)

This crash didn't occur in acceptance tests, marking as VERIFIED, SanityOnly.
Comment 12 thierry bordaz 2018-02-05 05:26:16 EST
Note, we have a crash (in 389-ds-base-1.3.7.5-14.el7.x86_64) that could be related to this BZ.

see https://bugzilla.redhat.com/show_bug.cgi?id=1538994#c11

It differs from this https://bugzilla.redhat.com/show_bug.cgi?id=1517383 because it looks a heap corruption.

It is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1517383 as it occurred at shutdown
Comment 21 errata-xmlrpc 2018-04-10 10:22:34 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0811

Note You need to log in before you can comment on or make changes to this bug.