Bug 1517383
Summary: | ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Viktor Ashirov <vashirov> | ||||||
Component: | 389-ds-base | Assignee: | thierry bordaz <tbordaz> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Viktor Ashirov <vashirov> | ||||||
Severity: | high | Docs Contact: | Marc Muehlfeld <mmuehlfe> | ||||||
Priority: | high | ||||||||
Version: | 7.5 | CC: | enewland, gparente, mreynolds, nkinder, rmeggins, tbordaz, tmihinto | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 389-ds-base-1.3.7.5-12 | Doc Type: | Known Issue | ||||||
Doc Text: |
Directory Server can terminate unexpectedly during shutdown
Directory Server uses the *nunc-stans* framework to manage connection events. If a connection is closed when shutting down the server, a *nunc-stans* job can access a freed connection structure. As a consequence, Directory Server can terminate unexpectedly. Because this situation occurs in a late state of the shutdown process, data is not corrupted or lost. Currently, no workaround is available.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1567042 (view as bug list) | Environment: | |||||||
Last Closed: | 2018-04-10 14:22:34 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1567042 | ||||||||
Attachments: |
|
Created attachment 1358746 [details] coredump.bz2 ================================================================= ==5707== ERROR: AddressSanitizer: heap-use-after-free on address 0x7ff71dc2abc8 at pc 0x55878c6cacc5 bp 0x7ff743788af0 sp 0x7ff743788ae0 READ of size 8 at 0x7ff71dc2abc8 thread T25 #0 0x55878c6cacc4 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1642 #1 0x7ff763c06cc0 in ns_job_done /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:291 #2 0x7ff763c09808 in event_cb /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:118 #3 0x7ff760e65a13 in event_base_loop /usr/src/debug/libevent-2.0.21-stable/event.c:1350 #4 0x7ff763c099dd in ns_event_fw_loop /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:308 #5 0x7ff763c06724 in event_loop_thread_func /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:581 #6 0x7ff763e29867 in _ZN6__asan10AsanThread11ThreadStartEv _asan_rtl_ #7 0x7ff7612d4dd4 in start_thread /usr/src/debug/glibc-2.17-c758a686/nptl/pthread_create.c:308 #8 0x7ff7609829bc in __clone /usr/src/debug////////glibc-2.17-c758a686/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113 0x7ff71dc2abc8 is located 25544 bytes inside of 376832-byte region [0x7ff71dc24800,0x7ff71dc80800) freed by thread T0 here: #0 0x7ff763e25dd9 in __interceptor_free _asan_rtl_ #1 0x7ff7637946c8 in slapi_ch_free /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:270 #2 0x55878c6c4c0c in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:75 #3 0x55878c6ceb56 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1260 #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187 #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274 previously allocated by thread T0 here: #0 0x7ff763e25ff5 in calloc _asan_rtl_ #1 0x7ff763794288 in slapi_ch_calloc /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:180 #2 0x55878c6c47a8 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:27 #3 0x55878c6cd72e in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:918 #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187 #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274 Thread T25 created by T0 here: #0 0x7ff763e1aa0a in __interceptor_pthread_create _asan_rtl_ #1 0x7ff763c08d6c in ns_thrpool_new /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:1466 #2 0x55878c6ad69a in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:233 #3 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274 Shadow bytes around the buggy address: 0x0fff63b7d520: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d530: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d540: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d550: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d560: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0fff63b7d570: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd 0x0fff63b7d580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d5a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d5b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0fff63b7d5c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==5707== ABORTING upstream ticket https://pagure.io/389-ds-base/issue/49450 https://pagure.io/389-ds-base/issue/48184 This is actually the cause of the asan fault. What is happening is that during shutdown we call in daemon.c: 1260 connection_table_free(the_connection_table); This frees the ct, however, nunc-stans *is still running*. Because NS is still running, and potentially has outstaining IO's remaining, these events *can and will fire*. This triggers an io event, and during this event we try to access the connection: of course, triggering the heap-use-after free in daemon.c:1642 The correct solution is that before we free the ct, we need to loop over it's content and call ns_job_done on *all* remaining connections to shutdown correctly. I am developing a patch now, Please note: While this asan fault is indeed valid, it may not be the root cause of the described issue. Build tested: 389-ds-base-1.3.7.5-14.el7.x86_64 (rebuilt with ASAN) This crash didn't occur in acceptance tests, marking as VERIFIED, SanityOnly. Note, we have a crash (in 389-ds-base-1.3.7.5-14.el7.x86_64) that could be related to this BZ. see https://bugzilla.redhat.com/show_bug.cgi?id=1538994#c11 It differs from this https://bugzilla.redhat.com/show_bug.cgi?id=1517383 because it looks a heap corruption. It is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1517383 as it occurred at shutdown Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0811 |
Created attachment 1358745 [details] stacktrace.txt Description of problem: Crash has been occurred during the acceptance tests at server shutdown. Version-Release number of selected component (if applicable): 389-ds-base-1.3.7.5-10.el7.x86_64 Stacktrace and coredump are attached. Error logs: Nov 23 21:37:37 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:37.630902090 -0500] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 39076 for LDAP requests Nov 23 21:37:40 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:40.204250822 -0500] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000 Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.390167823 -0500] - INFO - op_thread_cleanup - slapd shutting down - signaling operation threads - op stack size 1 max work q size 1 max work q stack size 1 Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.392253179 -0500] - INFO - slapd_daemon - slapd shutting down - closing down internal subsystems and plugins Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.396564343 -0500] - INFO - dblayer_pre_close - Waiting for 4 database threads to stop Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.584486955 -0500] - INFO - dblayer_pre_close - All database threads now stopped Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.593159090 -0500] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594238459 -0500] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 1 work q stack objects - freed 1 op stack objects Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594979876 -0500] - INFO - main - slapd stopped. Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.596373869 -0500] - ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired Nov 23 21:37:42 server.example.com kernel: ns-slapd[30647]: segfault at e0 ip 000055da1afd589a sp 00007f3fff66fb90 error 6 in ns-slapd[55da1afba000+54000] Nov 23 21:37:42 server.example.com abrt-hook-ccpp[32240]: Process 30629 (ns-slapd) of user 1001 killed by SIGSEGV - dumping core