1517383 – ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1517383 - ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired

Summary: ns-slapd segfaults with ERR - connection_release_nolock_ext - conn=0 fd=0 Att...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	thierry bordaz
QA Contact:	Viktor Ashirov
Docs Contact:	Marc Muehlfeld
URL:
Whiteboard:
Depends On:
Blocks:	1567042
TreeView+	depends on / blocked

Reported:	2017-11-24 18:48 UTC by Viktor Ashirov
Modified:	2021-09-09 12:51 UTC (History)
CC List:	7 users (show)
Fixed In Version:	389-ds-base-1.3.7.5-12
Doc Type:	Known Issue
Doc Text:	Directory Server can terminate unexpectedly during shutdown Directory Server uses the nunc-stans framework to manage connection events. If a connection is closed when shutting down the server, a nunc-stans job can access a freed connection structure. As a consequence, Directory Server can terminate unexpectedly. Because this situation occurs in a late state of the shutdown process, data is not corrupted or lost. Currently, no workaround is available.
Clone Of:
Clones:	1567042 (view as bug list)
Environment:
Last Closed:	2018-04-10 14:22:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
stacktrace.txt (6.65 KB, text/plain) 2017-11-24 18:48 UTC, Viktor Ashirov	no flags	Details
coredump.bz2 (1.04 MB, application/x-bzip) 2017-11-24 18:52 UTC, Viktor Ashirov	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 1515	None	closed	clean up and delete connections at shutdown	2020-10-21 18:36:36 UTC
Github	389ds 389-ds-base issues 2509	None	closed	invalid management of connection refcnt	2020-10-21 18:36:36 UTC
Red Hat Product Errata	RHBA-2018:0811	None	None	None	2018-04-10 14:23:27 UTC

Description Viktor Ashirov 2017-11-24 18:48:52 UTC

Created attachment 1358745 [details]
stacktrace.txt

Description of problem:
Crash has been occurred during the acceptance tests at server shutdown.

Version-Release number of selected component (if applicable):
389-ds-base-1.3.7.5-10.el7.x86_64

Stacktrace and coredump are attached.

Error logs:
Nov 23 21:37:37 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:37.630902090 -0500] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 39076 for LDAP requests
Nov 23 21:37:40 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:40.204250822 -0500] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.390167823 -0500] - INFO - op_thread_cleanup - slapd shutting down - signaling operation threads - op stack size 1 max work q size 1 max work q stack size 1
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.392253179 -0500] - INFO - slapd_daemon - slapd shutting down - closing down internal subsystems and plugins
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.396564343 -0500] - INFO - dblayer_pre_close - Waiting for 4 database threads to stop
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.584486955 -0500] - INFO - dblayer_pre_close - All database threads now stopped
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.593159090 -0500] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594238459 -0500] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 1 work q stack objects - freed 1 op stack objects
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.594979876 -0500] - INFO - main - slapd stopped.
Nov 23 21:37:42 server.example.com ns-slapd[30629]: [23/Nov/2017:21:37:42.596373869 -0500] - ERR - connection_release_nolock_ext - conn=0 fd=0 Attempt to release connection that is not acquired
Nov 23 21:37:42 server.example.com kernel: ns-slapd[30647]: segfault at e0 ip 000055da1afd589a sp 00007f3fff66fb90 error 6 in ns-slapd[55da1afba000+54000]
Nov 23 21:37:42 server.example.com abrt-hook-ccpp[32240]: Process 30629 (ns-slapd) of user 1001 killed by SIGSEGV - dumping core

Comment 2 Viktor Ashirov 2017-11-24 18:52:09 UTC

Created attachment 1358746 [details]
coredump.bz2

Comment 3 Viktor Ashirov 2017-11-27 18:25:55 UTC

=================================================================
==5707== ERROR: AddressSanitizer: heap-use-after-free on address 0x7ff71dc2abc8 at pc 0x55878c6cacc5 bp 0x7ff743788af0 sp 0x7ff743788ae0
READ of size 8 at 0x7ff71dc2abc8 thread T25
    #0 0x55878c6cacc4 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1642
    #1 0x7ff763c06cc0 in ns_job_done /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:291
    #2 0x7ff763c09808 in event_cb /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:118
    #3 0x7ff760e65a13 in event_base_loop /usr/src/debug/libevent-2.0.21-stable/event.c:1350
    #4 0x7ff763c099dd in ns_event_fw_loop /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_event_fw_event.c:308
    #5 0x7ff763c06724 in event_loop_thread_func /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:581
    #6 0x7ff763e29867 in _ZN6__asan10AsanThread11ThreadStartEv _asan_rtl_
    #7 0x7ff7612d4dd4 in start_thread /usr/src/debug/glibc-2.17-c758a686/nptl/pthread_create.c:308
    #8 0x7ff7609829bc in __clone /usr/src/debug////////glibc-2.17-c758a686/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113
0x7ff71dc2abc8 is located 25544 bytes inside of 376832-byte region [0x7ff71dc24800,0x7ff71dc80800)
freed by thread T0 here:
    #0 0x7ff763e25dd9 in __interceptor_free _asan_rtl_
    #1 0x7ff7637946c8 in slapi_ch_free /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:270
    #2 0x55878c6c4c0c in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:75
    #3 0x55878c6ceb56 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:1260
    #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187
    #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
previously allocated by thread T0 here:
    #0 0x7ff763e25ff5 in calloc _asan_rtl_
    #1 0x7ff763794288 in slapi_ch_calloc /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/ch_malloc.c:180
    #2 0x55878c6c47a8 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/conntable.c:27
    #3 0x55878c6cd72e in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/daemon.c:918
    #4 0x55878c6af864 in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:1187
    #5 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
Thread T25 created by T0 here:
    #0 0x7ff763e1aa0a in __interceptor_pthread_create _asan_rtl_
    #1 0x7ff763c08d6c in ns_thrpool_new /usr/src/debug/389-ds-base-1.3.7.5/src/nunc-stans/ns/ns_thrpool.c:1466
    #2 0x55878c6ad69a in ?? /usr/src/debug/389-ds-base-1.3.7.5/ldap/servers/slapd/main.c:233
    #3 0x7ff7608a637a in __libc_start_main /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:274
Shadow bytes around the buggy address:
  0x0fff63b7d520: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d530: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d540: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d550: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d560: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0fff63b7d570: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
  0x0fff63b7d580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0fff63b7d5c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==5707== ABORTING

Comment 4 thierry bordaz 2017-11-28 08:29:52 UTC

upstream ticket https://pagure.io/389-ds-base/issue/49450

Comment 5 wibrown@redhat.com 2017-11-28 08:52:58 UTC

https://pagure.io/389-ds-base/issue/48184 This is actually the cause of the asan fault.

What is happening is that during shutdown we call in daemon.c:

1260     connection_table_free(the_connection_table);

This frees the ct, however, nunc-stans *is still running*. Because NS is still running, and potentially has outstaining IO's remaining, these events *can and will fire*.

This triggers an io event, and during this event we try to access the connection: of course, triggering the heap-use-after free in daemon.c:1642

The correct solution is that before we free the ct, we need to loop over it's content and call ns_job_done on *all* remaining connections to shutdown correctly.

I am developing a patch now,

Comment 6 wibrown@redhat.com 2017-11-28 08:53:29 UTC

Please note: While this asan fault is indeed valid, it may not be the root cause of the described issue.

Comment 11 Viktor Ashirov 2018-01-30 12:52:29 UTC

Build tested: 
389-ds-base-1.3.7.5-14.el7.x86_64 (rebuilt with ASAN)

This crash didn't occur in acceptance tests, marking as VERIFIED, SanityOnly.

Comment 12 thierry bordaz 2018-02-05 10:26:16 UTC

Note, we have a crash (in 389-ds-base-1.3.7.5-14.el7.x86_64) that could be related to this BZ.

see https://bugzilla.redhat.com/show_bug.cgi?id=1538994#c11

It differs from this https://bugzilla.redhat.com/show_bug.cgi?id=1517383 because it looks a heap corruption.

It is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1517383 as it occurred at shutdown

Comment 21 errata-xmlrpc 2018-04-10 14:22:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0811

Note You need to log in before you can comment on or make changes to this bug.