Bug 1809160

Summary: Entry cache contention during base search [rhel-7.8.z]
Product: Red Hat Enterprise Linux 7 Reporter: RAD team bot copy to z-stream <autobot-eus-copy>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED ERRATA QA Contact: RHDS QE <ds-qe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: abokovoy, afarley, cobrown, ddas, jvilicic, lkrispen, mreynolds, mrhodes, msauton, nkinder, spichugi, tbordaz, tmihinto, unixsystems, vashirov
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.10.1-8.el7_8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1724761 Environment:
Last Closed: 2020-05-12 18:37:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1724761    
Bug Blocks:    
Attachments:
Description Flags
coredump none

Description RAD team bot copy to z-stream 2020-03-02 14:13:11 UTC
This bug has been copied from bug #1724761 and has been proposed to be backported to 7.8 z-stream (EUS).

Comment 5 Viktor Ashirov 2020-03-05 09:54:38 UTC
Build tested: 389-ds-base-1.3.10.1-7.el7_8.x86_64

ns-slapd crashes in dirsrvtests/tests/suites/filter/filter_match_test.py:


#0  0x00007fe7981cb454 in ldbm_back_entry_release (pb=pb@entry=0x55c8f7edf080, backend_info_ptr=0x0)
    at ldap/servers/slapd/back-ldbm/ldbm_search.c:1901
#1  0x00007fe7a682898c in cache_return_target_entry (pb=pb@entry=0x55c8f7edf080, operation=0x55c8f60dd860, be=<optimized out>)
    at ldap/servers/slapd/opshared.c:213
#2  0x00007fe7a682a72f in op_shared_search (pb=pb@entry=0x55c8f7edf080, send_result=send_result@entry=1) at ldap/servers/slapd/opshared.c:918
#3  0x00007fe7a683c347 in search_internal_callback_pb (pb=pb@entry=0x55c8f7edf080, callback_data=callback_data@entry=0x7fe787239c60, prc=prc@entry
=0x0, psec=psec@entry=0x7fe79af8ab90 <cos_dn_defs_cb>, prec=prec@entry=0x0) at ldap/servers/slapd/plugin_internal_op.c:727
#4  0x00007fe7a683c869 in slapi_search_internal_callback_pb (pb=pb@entry=0x55c8f7edf080, callback_data=callback_data@entry=0x7fe787239c60, prc=prc
@entry=0x0, psec=psec@entry=0x7fe79af8ab90 <cos_dn_defs_cb>, prec=prec@entry=0x0) at ldap/servers/slapd/plugin_internal_op.c:518
#5  0x00007fe79af8cd76 in cos_cache_add_dn_defs (pDefs=0x55c8f7545200, dn=0x55c8f86b1a60 "dc=example,dc=com")
    at ldap/servers/plugins/cos/cos_cache.c:1051
#6  0x00007fe79af8cd76 in cos_cache_build_definition_list (vattr_cacheable=0x55c8f7545228, pDefs=0x55c8f7545200)
    at ldap/servers/plugins/cos/cos_cache.c:666
#7  0x00007fe79af8cd76 in cos_cache_create_unlock () at ldap/servers/plugins/cos/cos_cache.c:458
#8  0x00007fe79af8cd76 in cos_cache_creation_lock () at ldap/servers/plugins/cos/cos_cache.c:586
#9  0x00007fe79af8d16d in cos_cache_wait_on_change (arg=<optimized out>) at ldap/servers/plugins/cos/cos_cache.c:419
#10 0x00007fe7a45eabfb in _pt_root (arg=0x55c8f7538480) at ../../../nspr/pr/src/pthreads/ptthread.c:201
#11 0x00007fe7a3f8aea5 in start_thread (arg=0x7fe78723a700) at pthread_create.c:307
#12 0x00007fe7a36368dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Moving to ASSIGNED.

Comment 6 Viktor Ashirov 2020-03-05 10:07:04 UTC
Created attachment 1667696 [details]
coredump

Comment 7 thierry bordaz 2020-03-05 13:42:04 UTC
Upstream ticket (#50542) was fixed with two commits, the second commit is present in 389-ds-base-1.3.10 (95acf7a083d67d7c9aa2d0ca3e01435d20d81981) but was not backported into rhel-7.8 branch.
Moving back the BZ to POST

Comment 8 Viktor Ashirov 2020-03-10 15:38:57 UTC
Build tested: 389-ds-base-1.3.10.1-8.el7_8.x86_64

To reproduce I used a group with 5000 members.
Then I ran in a loop parallel searches using ldapsearch with deref control:

ldapsearch -D 'cn=directory manager' -w password -b cn=admins,ou=groups,dc=example,dc=com -LLL -x -h localhost:38901 -o ldif-wrap=no '(objectClass=*)' -E 'deref=member:cn,userpassword'

I ran systemtap script from bz1724761 for tracing. 

On 389-ds-base-1.3.10.1-5.el7.x86_64 I was seeing 


slapi_search_internal_pb: 11215552 of 11485329 [ 97.6511 %]
         send_results_ext: 6002483 of 11485329 [ 52.2622 %]
             ldbm_back_next_search_entry: 5375008 of 11485329 [ 46.7989 %]
                 cache_lock: 4112189 of 11485329 [ 35.8038 %]
                 cache_unlock: 56385 of 11485329 [ 0.490931 %]

But after upgrade to 389-ds-base-1.3.10.1-8.el7_8.x86_64:

slapi_search_internal_pb: 7897172 of 8365846 [ 94.3978 %]
         send_results_ext: 2194928 of 8365846 [ 26.2368 %]
             ldbm_back_next_search_entry: 492290 of 8365846 [ 5.88452 %]

There is no longer an entry cache lock contention, marking as VERIFIED.

Comment 10 errata-xmlrpc 2020-05-12 18:37:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2079