Bug 1744623 - DB Deadlock on modrdn appears to corrupt database and entry cache
Summary: DB Deadlock on modrdn appears to corrupt database and entry cache
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.3
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: mreynolds
QA Contact: RHDS QE
URL:
Whiteboard:
Depends On:
Blocks: 1744146 1744662 1749289
TreeView+ depends on / blocked
 
Reported: 2019-08-22 14:40 UTC by mreynolds
Modified: 2020-09-13 22:09 UTC (History)
7 users (show)

Fixed In Version: 389-ds-base-1.3.10.1-5.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1744662 1749289 (view as bug list)
Environment:
Last Closed: 2020-03-31 19:46:15 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 2683 None None None 2020-09-13 22:09:09 UTC
Red Hat Product Errata RHBA-2020:1064 None None None 2020-03-31 19:46:55 UTC

Description mreynolds 2019-08-22 14:40:36 UTC
This bug is created as a clone of upstream ticket:
https://pagure.io/389-ds-base/issue/49624

#### Issue Description

If a db deadlock error occurs during a MODRDN that operation is tried again, but on the second pass things go wrong on that same operation.

So do a modrdn and move an entry so a new superior.  Then try and move it back to the original subtree.  Note -  I did instrument the code to always trigger a single deadlock error.  When I try to move it back to the original substree/superior I get an error 68! 

ldapsearch shows the entry was not moved as expected since we got an error:

 ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=accoun* \* \+
dn: cn=Accounting Managers,ou=MyOU,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
ou: groups
description: People who can manage accounting entries
uniqueMember: cn=dm
nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f
creatorsName:
modifiersName: cn=dm
createTimestamp: 20180323123313Z
modifyTimestamp: 20180323130251Z
entryid: 6
parentid: 10
entrydn: cn=accounting managers,ou=myou,ou=groups,dc=example,dc=com

 If I restart the server:

The entry is now in the original subtree (even though we got an error that it failed)

ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=accoun* \* \+

dn: cn=Accounting Managers,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
...

Performing ldapsearch using various scopes also gives inconsistent results for this entry:

[root@localhost BUILD]# ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s one -xLLL cn=account*

---> no results

[root@localhost BUILD]# ldapsearch -D cn=dm -w password -b "ou=groups,dc=example,dc=com" -s sub -xLLL cn=account*
dn: cn=Accounting Managers,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: groupOfUniqueNames
cn: Accounting Managers
ou: groups
description: People who can manage accounting entries
uniqueMember: cn=dm
nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f
creatorsName:
modifiersName: cn=dm
createTimestamp: 20180323123313Z
modifyTimestamp: 20180323130251Z
entryid: 6
parentid: 10
entrydn: cn=accounting managers,ou=groups,dc=example,dc=com


dbscan shows that the entry's  parentid is still pointing to the old subtree:

	rdn: cn=Accounting Managers
	objectClass: top
	objectClass: groupOfUniqueNames
	cn: Accounting Managers
	ou: groups
	description: People who can manage accounting entries
	uniqueMember: cn=dm
	nsUniqueId: 5a508aab-2e9611e8-b333e893-f12dcd9f
	creatorsName:
	modifiersName: cn=dm
	createTimestamp: 20180323123313Z
	modifyTimestamp: 20180323130251Z
	entryid: 6
	parentid: 10

parentid should be 3 (not 10) in this case.  Perhaps that is messing up the scoped search?

If I export and reimport the ldif, the parentid is adjusted to the correct value of 3, and the entry is found under the original subtree.

So we are seeing database & entry cache corruption when a db deadlock occurs on modrdn operations.

Comment 2 mreynolds 2019-08-22 16:02:24 UTC
Cloned to RHEL 8:  

https://bugzilla.redhat.com/show_bug.cgi?id=1744662

Comment 7 Viktor Ashirov 2020-01-14 13:01:57 UTC
Build tested: 389-ds-base-1.3.10.1-4.el7

Using an automated reproducer from https://pagure.io/389-ds-base/pull-request/50821
I'm getting error 68 on MODRDN. And with ASAN build I'm getting the following error:

=================================================================
==4427==ERROR: AddressSanitizer: heap-use-after-free on address 0x60400024f3a0 at pc 0x7f9b0b8093c3 bp 0x7f9aeb07e1b0 sp 0x7f9aeb07e1a0
READ of size 8 at 0x60400024f3a0 thread T20
    #0 0x7f9b0b8093c2 in slapi_sdn_get_dn (/usr/lib64/dirsrv/libslapd.so.0+0xeb3c2)
    #1 0x7f9b0b809728 in slapi_sdn_dup (/usr/lib64/dirsrv/libslapd.so.0+0xeb728)
    #2 0x7f9afcb4e457 in ldbm_back_modrdn ldap/servers/slapd/back-ldbm/ldbm_modrdn.c:254
    #3 0x7f9b0b895805 in op_shared_rename ldap/servers/slapd/modrdn.c:612
    #4 0x7f9b0b896deb in do_modrdn (/usr/lib64/dirsrv/libslapd.so.0+0x178deb)
    #5 0x5594c1e03c1a in connection_dispatch_operation ldap/servers/slapd/connection.c:620
    #6 0x5594c1e03c1a in connection_threadmain ldap/servers/slapd/connection.c:1791
    #7 0x7f9b09953bfa in _pt_root ../../../nspr/pr/src/pthreads/ptthread.c:201
    #8 0x7f9b092f3ea4 in start_thread /usr/src/debug/glibc-2.17-c758a686/nptl/pthread_create.c:307
    #9 0x7f9b0899f8dc in __clone (/lib64/libc.so.6+0xfe8dc)

0x60400024f3a0 is located 16 bytes inside of 40-byte region [0x60400024f390,0x60400024f3b8)
freed by thread T20 here:
    #0 0x7f9b0bf79020 in __interceptor_free (/lib64/libasan.so.5+0xee020)
    #1 0x7f9b0b7f11e8 in slapi_ch_free (/usr/lib64/dirsrv/libslapd.so.0+0xd31e8)
    #2 0x7f9afcb4e40e in ldbm_back_modrdn ldap/servers/slapd/back-ldbm/ldbm_modrdn.c:252
    #3 0x7f9b0b895805 in op_shared_rename ldap/servers/slapd/modrdn.c:612
    #4 0x7f9b0b896deb in do_modrdn (/usr/lib64/dirsrv/libslapd.so.0+0x178deb)
    #5 0x5594c1e03c1a in connection_dispatch_operation ldap/servers/slapd/connection.c:620
    #6 0x5594c1e03c1a in connection_threadmain ldap/servers/slapd/connection.c:1791
    #7 0x7f9b09953bfa in _pt_root ../../../nspr/pr/src/pthreads/ptthread.c:201

previously allocated by thread T20 here:
    #0 0x7f9b0bf793e0 in malloc (/lib64/libasan.so.5+0xee3e0)
    #1 0x7f9b0b7f0a03 in slapi_ch_malloc (/usr/lib64/dirsrv/libslapd.so.0+0xd2a03)
    #2 0x7f9b0b807d52 in slapi_sdn_new (/usr/lib64/dirsrv/libslapd.so.0+0xe9d52)
    #3 0x7f9b0b808b7e in slapi_sdn_new_normdn_byval (/usr/lib64/dirsrv/libslapd.so.0+0xeab7e)
    #4 0x7f9afcb545a7 in ldbm_back_modrdn ldap/servers/slapd/back-ldbm/ldbm_modrdn.c:955
    #5 0x7f9b0b895805 in op_shared_rename ldap/servers/slapd/modrdn.c:612
    #6 0x7f9b0b896deb in do_modrdn (/usr/lib64/dirsrv/libslapd.so.0+0x178deb)
    #7 0x5594c1e03c1a in connection_dispatch_operation ldap/servers/slapd/connection.c:620
    #8 0x5594c1e03c1a in connection_threadmain ldap/servers/slapd/connection.c:1791
    #9 0x7f9b09953bfa in _pt_root ../../../nspr/pr/src/pthreads/ptthread.c:201

Thread T20 created by T0 here:
    #0 0x7f9b0bedce9f in pthread_create (/lib64/libasan.so.5+0x51e9f)
    #1 0x7f9b099538cb in _PR_CreateThread ../../../nspr/pr/src/pthreads/ptthread.c:433

SUMMARY: AddressSanitizer: heap-use-after-free (/usr/lib64/dirsrv/libslapd.so.0+0xeb3c2) in slapi_sdn_get_dn
Shadow bytes around the buggy address:
  0x0c0880041e20: fa fa 00 00 00 00 04 fa fa fa 00 00 00 00 04 fa
  0x0c0880041e30: fa fa 00 00 00 00 04 fa fa fa 00 00 00 00 06 fa
  0x0c0880041e40: fa fa 00 00 00 00 06 fa fa fa 00 00 00 00 06 fa
  0x0c0880041e50: fa fa 00 00 00 00 04 fa fa fa 00 00 00 00 04 fa
  0x0c0880041e60: fa fa 00 00 00 00 04 fa fa fa 00 00 00 00 04 fa
=>0x0c0880041e70: fa fa fd fd[fd]fd fd fa fa fa fd fd fd fd fd fa
  0x0c0880041e80: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
  0x0c0880041e90: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c0880041ea0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c0880041eb0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c0880041ec0: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4427==ABORTING

Marking as ASSIGNED.

Comment 9 Viktor Ashirov 2020-02-07 09:48:40 UTC
Patch is upstream https://pagure.io/389-ds-base/c/7abd73c62cc04c38977c119b0d3254ec9e0d496f?branch=389-ds-base-1.3.10
I've tested a scratch ASAN build with this patch, tests passed.

Comment 11 Viktor Ashirov 2020-02-10 10:53:55 UTC
Build tested:
389-ds-base-1.3.10.1-5.el7 with ASAN

No errors reported during the test. Marking as VERIFIED.

Comment 13 errata-xmlrpc 2020-03-31 19:46:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1064


Note You need to log in before you can comment on or make changes to this bug.