Bug 2234613 - RHDS-11 investigate long etime and error "Retry cound exceeded' on BIND/ADD/DEL/MOD from revert_cache [12.4]
Summary: RHDS-11 investigate long etime and error "Retry cound exceeded' on BIND/ADD/D...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Directory Server
Classification: Red Hat
Component: 389-ds-base
Version: 11.7
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: DS12.4
: dirsrv-12.4
Assignee: thierry bordaz
QA Contact: LDAP QA Team
Evgenia Martynyuk
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks: 2268177 2268183 2268186
TreeView+ depends on / blocked
 
Reported: 2023-08-24 21:47 UTC by Marc Sauton
Modified: 2024-11-14 07:50 UTC (History)
12 users (show)

Fixed In Version: redhat-ds-12-9040020240116164822.1674d574
Doc Type: Bug Fix
Doc Text:
.Directory Server now flushes the entry cache less frequently Previously, Directory Server flushed its entry cache even when it was not necessary. As a result, in certain situations, Directory Server was unresponsive and had bad performance. With this update, Director Server flushes the entry cache only when it is necessary.
Clone Of:
: 2268177 (view as bug list)
Environment:
Last Closed: 2024-05-07 00:15:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 5939 0 None closed During an update, if the target entry is reverted in the entry cache, the server should not retry to lock it. 2024-01-03 16:55:23 UTC
Red Hat Issue Tracker IDMDS-4015 0 None None None 2024-01-03 16:59:57 UTC
Red Hat Issue Tracker IDMDS-4255 0 None None None 2024-03-26 08:22:13 UTC
Red Hat Product Errata RHEA-2024:2718 0 None None None 2024-05-07 00:15:37 UTC

Description Marc Sauton 2023-08-24 21:47:15 UTC
Description of problem:

issue to investigate a RHDS-11 long etime situation with error "Retry cound exceeded' on BIND/ADD/DEL/MOD from revert_cache ( entry cache )


Version-Release number of selected component (if applicable):

RHDS-11.7 on RHEL-8.8
389-ds-base-1.4.3.34-1.module+el8dsrv+18528+22f7779f.x86_64
redhat-release-8.8-0.8.el8.x86_64


How reproducible:
N/A, high traffic, and other unknowns in environment.

Steps to Reproduce:
1. N/A
2.
3.

Actual results:

pattern event in errors log:

ERR - find_entry_internal_dn - Retry count exceeded (uid=

thread signature:
Contention on backend lock while reverting TXN failure
Many threads (update) are stucked waiting for backend lock => can contribute to worker starvation

        Thread 36 (Thread 0x7f16839fe700 (LWP 1672443)):
        #0  0x00007f191177ee92 in flush_hash () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #1  0x00007f191177f103 in revert_cache () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #2  0x00007f19117aea7c in ldbm_back_modify () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #3  0x00007f19204604d0 in op_shared_modify () at target:/usr/lib64/dirsrv/libslapd.so.0
        #4  0x00007f192046112b in modify_internal_pb () at target:/usr/lib64/dirsrv/libslapd.so.0
        #5  0x00007f1920486479 in pw_apply_mods () at target:/usr/lib64/dirsrv/libslapd.so.0
        #6  0x00007f1920486686 in set_retry_cnt_and_time.constprop () at target:/usr/lib64/dirsrv/libslapd.so.0
        #7  0x00007f19204867fb in update_pw_retry () at target:/usr/lib64/dirsrv/libslapd.so.0
        #8  0x00007f192048c2cb in send_ldap_result_ext () at target:/usr/lib64/dirsrv/libslapd.so.0
        #9  0x00007f192048c54f in send_ldap_result () at target:/usr/lib64/dirsrv/libslapd.so.0
        #10 0x00007f1920473267 in slapi_send_ldap_result () at target:/usr/lib64/dirsrv/libslapd.so.0
        #11 0x00007f191179c01b in ldbm_back_bind () at target:/usr/lib64/dirsrv/plugins/libback-ldbm.so
        #12 0x0000562dfa8c4ed2 in pw_verify_be_dn ()
        #13 0x0000562dfa8b1449 in do_bind ()
        #14 0x0000562dfa8b64b5 in connection_threadmain ()
        #15 0x00007f191ce97968 in _pt_root () at target:/lib64/libnspr4.so
        #16 0x00007f191c8321cf in start_thread () at target:/lib64/libpthread.so.0
        #17 0x00007f191eae5dd3 in clone () at target:/lib64/libc.so.6


Expected results:
yes


Additional info:

RHDS-11.6 related fix: bz 2051476 - high contention in find_entry_internal_dn on mixed load
https://bugzilla.redhat.com/2051476
https://access.redhat.com/errata/RHBA-2023:0186
"
Cause: Cache c_mutex type was changed from PR_Monitor to pthread recursive mutex implementation. It brought a minor performance boost but also proved to be a less stable solution in its current way.
Additionally, another issue happens when updating the parent entry of a deleted entry (numsubordinates), if it fails to lock the parent it does not return the parent entry.

Consequence: "find_entry_internal_dn - Retry count exceeded" error appears in the error log with high concurrent mixed operations load on a flat tree.
And when the other issue happens, refcnt becomes invalid. Which may lead to other cache locking issues.

Fix: Change cache c_mutex type to PR_Monitor.
In the case of the failure to lock the parent entry, the entry should be returned.

Result: "find_entry_internal_dn - Retry count exceeded" error doesn't appear. And the cache structure exists in the correct state with the correct refcnt.
"

so
ERR - find_entry_internal_dn - Retry count exceeded
will happen again, and there have been more reports
https://bugzilla.redhat.com/show_bug.cgi?id=2051476#c45

Comment 31 Colum Gaynor 2024-03-20 19:29:58 UTC
@tbordaz Thanks - Colum

Comment 36 errata-xmlrpc 2024-05-07 00:15:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (redhat-ds:12 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2024:2718

Comment 37 Red Hat Bugzilla 2024-09-05 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.