RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1724761 - Entry cache contention during base search
Summary: Entry cache contention during base search
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 7.8
Assignee: thierry bordaz
QA Contact: RHDS QE
URL:
Whiteboard:
Depends On:
Blocks: 1788833 1790981 1809160
TreeView+ depends on / blocked
 
Reported: 2019-06-27 17:58 UTC by Josip Vilicic
Modified: 2023-12-15 16:35 UTC (History)
15 users (show)

Fixed In Version: 389-ds-base-1.3.10.2-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1790981 1809160 (view as bug list)
Environment:
Last Closed: 2020-09-29 19:46:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Valid script to measure contention on entry cache lock during deref internal searches (6.28 KB, text/plain)
2019-08-01 11:55 UTC, thierry bordaz
no flags Details
ldap.entrycache_on_deref.stp (6.28 KB, text/plain)
2020-05-14 17:36 UTC, Marc Sauton
no flags Details
Script shell to interprete the stap output (6.46 KB, application/x-shellscript)
2020-05-14 17:36 UTC, thierry bordaz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 3598 0 None closed Entry cache contention during base search 2021-02-16 18:36:12 UTC
Red Hat Product Errata RHBA-2020:3894 0 None None None 2020-09-29 19:47:59 UTC

Comment 7 mreynolds 2019-07-15 20:00:18 UTC
Initial pstack analysis shows two issues....

[1] So about half of the pstacks how FreeIPA's schema compatibility plugin is blocking all the worker threads while it's doing some kind of work during a modify operation:

Thread 10 (Thread 0x7fc93e222700 (LWP 2235)):
#0  0x00007fc98418d463 in csnset_dup () at /usr/lib64/dirsrv/libslapd.so.0
#1  0x00007fc98421487e in slapi_value_dup () at /usr/lib64/dirsrv/libslapd.so.0
#2  0x00007fc984216c35 in valueset_set_valueset () at /usr/lib64/dirsrv/libslapd.so.0
#3  0x00007fc984182ff7 in slapi_attr_dup () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007fc984199598 in slapi_entry_dup () at /usr/lib64/dirsrv/libslapd.so.0
#5  0x00007fc973e6d5c1 in wrap_search_internal_get_entry_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#6  0x00007fc9841f28f4 in send_ldap_search_entry_ext () at /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007fc9841f30ec in send_ldap_search_entry () at /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007fc9841cdf34 in send_entry.isra.0 () at /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007fc9841ce498 in send_results_ext.constprop.4 () at /usr/lib64/dirsrv/libslapd.so.0
#10 0x00007fc9841d0361 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0
#11 0x00007fc9841e17a7 in search_internal_callback_pb () at /usr/lib64/dirsrv/libslapd.so.0
#12 0x00007fc973e6d8dd in wrap_search_internal_get_entry () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#13 0x00007fc973e620f7 in format_maybe_add_sdn_list () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#14 0x00007fc973e646e5 in format_deref_rx.isra.6 () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#15 0x00007fc973e64ce9 in format_deref_r () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#16 0x00007fc973e62c89 in format_expand () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#17 0x00007fc973e63dd6 in format_format () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#18 0x00007fc973e659e9 in format_get_data_set () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#19 0x00007fc973e582c9 in backend_set_entry_from () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#20 0x00007fc973e5c33d in backend_shr_set_config_entry_set_one_dn.isra.0 () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#21 0x00007fc973e5e926 in backend_shr_update_references_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#22 0x00007fc973e6b66f in map_data_foreach_map () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#23 0x00007fc973e5bbeb in backend_shr_update_references () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#24 0x00007fc973e5f1a1 in backend_shr_modify_cb.part.19 () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#25 0x00007fc973e5f581 in backend_shr_betxn_post_modify_cb () at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#26 0x00007fc9841dc798 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#27 0x00007fc9841dca53 in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0
#28 0x00007fc975e361a5 in ldbm_back_modify () at /usr/lib64/dirsrv/plugins/libback-ldbm.so
#29 0x00007fc9841c8d76 in op_shared_modify () at /usr/lib64/dirsrv/libslapd.so.0
#30 0x00007fc9841ca29b in do_modify () at /usr/lib64/dirsrv/libslapd.so.0


So it's not completely hung, but thread 10 is holding the write lock and this thread is running for a very long time.  It's this thread that is blocking all the other worker threads and creating a condition where the server seems hung.  Someone more familiar with this plugin needs to investigate this, but this does sounds like a known problem.


[2]  The other pstacks show what appears to be a known issue with contention around virtual attributes:

    https://pagure.io/389-ds-base/issue/49873

Thierry, I know you worked on the vattr contention issue, looks to be the same, but what about the schema compat hang, is that also a known issue?

Comment 9 Alexander Bokovoy 2019-07-16 06:51:32 UTC
(In reply to mreynolds from comment #7)
> Initial pstack analysis shows two issues....
> 
> [1] So about half of the pstacks how FreeIPA's schema compatibility plugin
> is blocking all the worker threads while it's doing some kind of work during
> a modify operation:
> 
> Thread 10 (Thread 0x7fc93e222700 (LWP 2235)):
> #0  0x00007fc98418d463 in csnset_dup () at /usr/lib64/dirsrv/libslapd.so.0
> #1  0x00007fc98421487e in slapi_value_dup () at
> /usr/lib64/dirsrv/libslapd.so.0
> #2  0x00007fc984216c35 in valueset_set_valueset () at
> /usr/lib64/dirsrv/libslapd.so.0
> #3  0x00007fc984182ff7 in slapi_attr_dup () at
> /usr/lib64/dirsrv/libslapd.so.0
> #4  0x00007fc984199598 in slapi_entry_dup () at
> /usr/lib64/dirsrv/libslapd.so.0
> #5  0x00007fc973e6d5c1 in wrap_search_internal_get_entry_cb () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #6  0x00007fc9841f28f4 in send_ldap_search_entry_ext () at
> /usr/lib64/dirsrv/libslapd.so.0
> #7  0x00007fc9841f30ec in send_ldap_search_entry () at
> /usr/lib64/dirsrv/libslapd.so.0
> #8  0x00007fc9841cdf34 in send_entry.isra.0 () at
> /usr/lib64/dirsrv/libslapd.so.0
> #9  0x00007fc9841ce498 in send_results_ext.constprop.4 () at
> /usr/lib64/dirsrv/libslapd.so.0
> #10 0x00007fc9841d0361 in op_shared_search () at
> /usr/lib64/dirsrv/libslapd.so.0
> #11 0x00007fc9841e17a7 in search_internal_callback_pb () at
> /usr/lib64/dirsrv/libslapd.so.0
> #12 0x00007fc973e6d8dd in wrap_search_internal_get_entry () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #13 0x00007fc973e620f7 in format_maybe_add_sdn_list () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #14 0x00007fc973e646e5 in format_deref_rx.isra.6 () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #15 0x00007fc973e64ce9 in format_deref_r () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #16 0x00007fc973e62c89 in format_expand () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #17 0x00007fc973e63dd6 in format_format () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #18 0x00007fc973e659e9 in format_get_data_set () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #19 0x00007fc973e582c9 in backend_set_entry_from () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #20 0x00007fc973e5c33d in backend_shr_set_config_entry_set_one_dn.isra.0 ()
> at /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #21 0x00007fc973e5e926 in backend_shr_update_references_cb () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #22 0x00007fc973e6b66f in map_data_foreach_map () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #23 0x00007fc973e5bbeb in backend_shr_update_references () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #24 0x00007fc973e5f1a1 in backend_shr_modify_cb.part.19 () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #25 0x00007fc973e5f581 in backend_shr_betxn_post_modify_cb () at
> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
> #26 0x00007fc9841dc798 in plugin_call_func () at
> /usr/lib64/dirsrv/libslapd.so.0
> #27 0x00007fc9841dca53 in plugin_call_plugins () at
> /usr/lib64/dirsrv/libslapd.so.0
> #28 0x00007fc975e361a5 in ldbm_back_modify () at
> /usr/lib64/dirsrv/plugins/libback-ldbm.so
> #29 0x00007fc9841c8d76 in op_shared_modify () at
> /usr/lib64/dirsrv/libslapd.so.0
> #30 0x00007fc9841ca29b in do_modify () at /usr/lib64/dirsrv/libslapd.so.0
> 
> 
> So it's not completely hung, but thread 10 is holding the write lock and
> this thread is running for a very long time.  It's this thread that is
> blocking all the other worker threads and creating a condition where the
> server seems hung.  Someone more familiar with this plugin needs to
> investigate this, but this does sounds like a known problem.
It the stacktrace above slapi-nis is performing re-calculation of an entry that includes dereferencing other entries via an attribute value that points to those entries (%deref_r() function in compat tree definitions). However, the top of the stacktrace is back in slapd itself. The code in slapi-nis calls wrap_search_internal_get_entry_cb() which is:

static int
wrap_search_internal_get_entry_cb(Slapi_Entry *e, void *cb)
{
        Slapi_Entry **ret = cb;
        if (*ret) {
                slapi_entry_free(*ret);
        }
        *ret = slapi_entry_dup(e);
        return 0;
}

and we are stuck in slapi_entry_dup() here in one of loops that copy attributes:

    for (a = e->e_attrs; a != NULL; a = a->a_next) {
        Slapi_Attr *newattr = slapi_attr_dup(a);
        if (lastattr == NULL) {
            ec->e_attrs = newattr;
        } else {
            lastattr->a_next = newattr;
        }
        lastattr = newattr;
    }
    lastattr = NULL;
    for (a = e->e_deleted_attrs; a != NULL; a = a->a_next) {
        Slapi_Attr *newattr = slapi_attr_dup(a);
        if (lastattr == NULL) {
            ec->e_deleted_attrs = newattr;
        } else {
            lastattr->a_next = newattr;
        }
        lastattr = newattr;
    }


In both slapi_attr_dup() calls we would get to the following code:

Slapi_Attr *
slapi_attr_dup(const Slapi_Attr *attr)
{
    Slapi_Attr *newattr = slapi_attr_new();
    slapi_attr_init(newattr, attr->a_type);
    slapi_valueset_set_valueset(&newattr->a_deleted_values, &attr->a_deleted_values);
    slapi_valueset_set_valueset(&newattr->a_present_values, &attr->a_present_values);
    newattr->a_deletioncsn = csn_dup(attr->a_deletioncsn);
    return newattr;
}

According to the traceback, it is one of slapi_valueset_set_valueset() calls and then we end up in valueset_set_valueset() which copies value array:

            for (i = 0; i < vs1->num; i++) {
                vs1->va[i] = slapi_value_dup(vs2->va[i]);
            }


Slapi_Value *
slapi_value_dup(const Slapi_Value *v)
{
    Slapi_Value *newvalue = value_new(&v->bv, CSN_TYPE_UNKNOWN, NULL);
    newvalue->v_csnset = csnset_dup(v->v_csnset);
    newvalue->v_flags = v->v_flags;
    return newvalue;
}

and, finally, csnset_dup() has this loop:

CSNSet *
csnset_dup(const CSNSet *csnset)
{
    CSNSet *newcsnset = NULL;
    CSNSet **curnode = &newcsnset;
    const CSNSet *n = csnset;
    while (n != NULL) {
        csnset_add_csn(curnode, n->type, &n->csn);
        n = n->next;
        curnode = &((*curnode)->next);
    }
    return newcsnset;
}

Perhaps, we are seeing an infinite loop here? E.g. CSNSet list is damaged and points back to some existing set element?

Comment 29 thierry bordaz 2019-08-01 10:04:41 UTC
Created attachment 1596918 [details]
script to measure contention on entry cache lock during deref internal searches

This script can collect information about contention on entry cache lock.
It focus on internal searches triggers by deref callback during a direct search

Should be run as root. Collected data are function names and timestamps

Comment 30 thierry bordaz 2019-08-01 11:55:31 UTC
Created attachment 1596987 [details]
Valid script to measure contention on entry cache lock during deref internal searches

This script replace the previous script that contained invalid library path

Comment 36 thierry bordaz 2019-08-08 14:44:09 UTC
upstream ticket https://pagure.io/389-ds-base/issue/50542

Comment 53 Viktor Ashirov 2020-04-14 17:34:44 UTC
Build tested: 389-ds-base-1.3.10.2-1.el7.x86_64

To reproduce I used a group with 5000 members.
Then I ran in a loop parallel searches using ldapsearch with deref control:

ldapsearch -D 'cn=directory manager' -w password -b cn=admins,ou=groups,dc=example,dc=com -LLL -x -h localhost:38901 -o ldif-wrap=no '(objectClass=*)' -E 'deref=member:cn,userpassword'

I ran the attached systemtap script for tracing. 

On 389-ds-base-1.3.10.1-5.el7.x86_64 I was seeing 


slapi_search_internal_pb: 11215552 of 11485329 [ 97.6511 %]
         send_results_ext: 6002483 of 11485329 [ 52.2622 %]
             ldbm_back_next_search_entry: 5375008 of 11485329 [ 46.7989 %]
                 cache_lock: 4112189 of 11485329 [ 35.8038 %]
                 cache_unlock: 56385 of 11485329 [ 0.490931 %]

But after upgrade to 389-ds-base-1.3.10.2-1.el7.x86_64:

slapi_search_internal_pb: 9507705 of 9753673 [ 97.4782 %]
         send_results_ext: 3214766 of 9753673 [ 32.9595 %]
             ldbm_back_next_search_entry: 1445493 of 9753673 [ 14.82 %]


There is no longer an entry cache lock contention, marking as VERIFIED.

Comment 54 Marc Sauton 2020-05-14 17:36:03 UTC
Created attachment 1688559 [details]
ldap.entrycache_on_deref.stp

Comment 55 thierry bordaz 2020-05-14 17:36:45 UTC
Created attachment 1688560 [details]
Script shell to interprete the stap output

Comment 64 errata-xmlrpc 2020-09-29 19:46:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3894


Note You need to log in before you can comment on or make changes to this bug.