Bug 2051476 - high contention in find_entry_internal_dn on mixed load
Summary: high contention in find_entry_internal_dn on mixed load
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Directory Server
Classification: Red Hat
Component: 389-ds-base
Version: 11.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: DS11.6
: dirsrv-11.6
Assignee: Simon Pichugin
QA Contact: RHDS QE
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-07 11:01 UTC by Anton Bobrov
Modified: 2023-07-04 14:06 UTC (History)
12 users (show)

Fixed In Version: redhat-ds-11-8070020230103174318.95958119
Doc Type: Bug Fix
Doc Text:
Cause: Cache c_mutex type was changed from PR_Monitor to pthread recursive mutex implementation. It brought a minor performance boost but also proved to be a less stable solution in its current way. Additionally, another issue happens when updating the parent entry of a deleted entry (numsubordinates), if it fails to lock the parent it does not return the parent entry. Consequence: "find_entry_internal_dn - Retry count exceeded" error appears in the error log with high concurrent mixed operations load on a flat tree. And when the other issue happens, refcnt becomes invalid. Which may lead to other cache locking issues. Fix: Change cache c_mutex type to PR_Monitor. In the case of the failure to lock the parent entry, the entry should be returned. Result: "find_entry_internal_dn - Retry count exceeded" error doesn't appear. And the cache structure exists in the correct state with the correct refcnt.
Clone Of:
Environment:
Last Closed: 2023-01-17 08:52:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 5262 0 None open high contention in find_entry_internal_dn on mixed load 2022-04-13 09:41:01 UTC
Red Hat Issue Tracker IDMDS-2193 0 None None None 2022-04-11 14:13:10 UTC
Red Hat Issue Tracker IDMDS-2343 0 None None None 2022-06-27 14:48:59 UTC
Red Hat Issue Tracker IDMDS-2707 0 None None None 2023-01-09 15:09:53 UTC
Red Hat Product Errata RHBA-2023:0186 0 None None None 2023-01-17 08:52:13 UTC

Description Anton Bobrov 2022-02-07 11:01:33 UTC
Description of problem:

With high concurrent mixed operations load on a flat tree there is a very high contention in find_entry_internal_dn function on parent entry.

the flat tree would consist of a single parent entry with a number of immediate children underneath.

the mixed load would consist of a multiple asynchronous operations executed on separate worker threads, resembling the following (or alike) scenario:

SRCH base="ou=transactionlog,dc=example,dc=com" SORT -createTimestamp
ADD dn="transnum=<TIMESTAMP-SOMETHING>,ou=transactionlog,dc=example,dc=com"
SRCH base="transnum=<TIMESTAMP-SOMETHING>,ou=TransactionLog,dc=example,dc=com"
MOD dn="transnum=<TIMESTAMP-SOMETHING>,ou=TransactionLog,dc=example,dc=com"

since the parent entry needs to be locked this would result in high contention and the (problematic) way that locking is currently implemented creates an unnecessary high contention which leads to operation failures like these:

[...] - ERR - find_entry_internal_dn - Retry count exceeded (ou=transactionlog,dc=example,dc=com)

[...] conn=85763 op=8 ADD dn="transnum=<TIMESTAMP-SOMETHING>,ou=transactionlog,dc=example,dc=com"
[...] conn=85763 op=8 RESULT err=32 tag=105 nentries=0 wtime=0.005113090 optime=100.712226228 etime=100.717335582

the "Retry count exceeded" error indicates that find_entry_internal_dn() has tried and failed 1000 times to lock the entry.

the result code of failed operation is incorrect. the code should be LDAP_OTHER (80) or similar indicating internal server problem and a relevant message to notify a client to the nature of the error (eg "failed to acquire entry lock").

note that if find_entry_internal_dn() is successful at acquiring a lock after 100s of tries the error message is not logged and the contention problem simply stays hidden making it harder to troubleshoot and address.

How reproducible:

this should be reproducible with sufficient concurrent mixed load (as per description) driven by slamd or ldclt clients.

Comment 1 Anton Bobrov 2022-02-07 11:44:32 UTC
There are multiple problems with related code that can be broken down into the following major points:

1) The locking is backwards.
2) The locking is exclusive.
3) The locking is unfair.

(1) The locking logic works by 1st retrieving an entry from a backend and then locking its cache representation. Instead it should perform a logical lock of the entry 1st before retrieving it. In order to do so entry locking should either be decoupled or a dummy entry created 1st on which a lock can be held before trying to retrieve an entry from a backend. In essence the locking should be based on entry DN not the actual entry itself. This would prevent wasting cpu cycles and other resources because logical entry lock (by DN) costs nothing while backend retrieval plus logical lock always costs resources especially if it is retried 100s of times like in this case.

(2) Currently an exclusive locking is performed. This should be reworked with read locks and write locks respectively, depending on a operation. Taking exclusive locks here just worsens the contention in a very severe way and it is completely unnecessary. If an operation takes a read lock it does not create contention with other read operations by doing so.

(3) Fair locking should be implemented here to avoid situations where worker threads waiting on a lock the longest never having a chance to acquire it. This should also smooth out etimes on the operations involved ie in high contention situation all threads would incur waiting costs roughly equally. This gives some better predictability and consistent behavior to operations involved plus avoids single operation failures.

Furthermore this can be drastically improved by making whether or not operational attributes like 'numSubordinates' and 'hasSubordinates' are returned, configurable. If there is no reason to update the parent entry only a read lock can be taken on it (to prevent orphaned entries) and thus there will be no contention for write operations on its children. For example we had 'numSubordinates' implemented in OpenDS as a virtual attribute where it would be computed/generated on the fly only when explicitly requested without storing it as part of the entry. Its worth looking into how a similar approach can be implemented here as well.

Please note that these comments/suggestions are not some hand wavy stuff but are actually based on the work I had previously done addressing similar issues with OpenDS implementation. I hope they are taken on board with that in mind and are ultimately helpful to improve the current implementation going forward.

Comment 2 Anton Bobrov 2022-02-07 14:02:51 UTC
find_entry_internal_uniqueid() function suffers from exact same problems. Actually even worse if UUIDCACHE_ON is undefined.

Comment 23 joel 2022-05-25 17:16:14 UTC
Hello Petr,

If there is a hot fix could I offer it to the cu for a test. Not sure if they will be able/willing to do so.

Thanks

Comment 43 errata-xmlrpc 2023-01-17 08:52:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (redhat-ds:11 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0186

Comment 44 Têko Mihinto 2023-06-12 16:33:41 UTC
Hi,

A customer is using 389-ds-1.4.3.31-11 ( the version in errata https://access.redhat.com/errata/RHBA-2023:0186 )
but is still getting the "find_entry_internal_dn - Retry count exceeded" error message:
===============================================
errors:[08/Jun/2023:12:46:06.612302100 +051800] - ERR - find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
errors:[09/Jun/2023:11:49:48.933452279 +051800] - ERR - find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
errors:[10/Jun/2023:12:32:48.203596735 +051800] - ERR - find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
errors:[10/Jun/2023:12:32:52.062239379 +051800] - ERR - find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
errors:[12/Jun/2023:15:31:52.128013373 +051800] - ERR - find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
===============================================

Is this message still expected with the fix in place?

What debug data would be useful for further troubleshooting?

Regards,
Têko.

Comment 45 mreynolds 2023-06-12 18:48:05 UTC
(In reply to Têko Mihinto from comment #44)
> Hi,
> 
> A customer is using 389-ds-1.4.3.31-11 ( the version in errata
> https://access.redhat.com/errata/RHBA-2023:0186 )
> but is still getting the "find_entry_internal_dn - Retry count exceeded"
> error message:
> ===============================================
> errors:[08/Jun/2023:12:46:06.612302100 +051800] - ERR -
> find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
> errors:[09/Jun/2023:11:49:48.933452279 +051800] - ERR -
> find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
> errors:[10/Jun/2023:12:32:48.203596735 +051800] - ERR -
> find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
> errors:[10/Jun/2023:12:32:52.062239379 +051800] - ERR -
> find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
> errors:[12/Jun/2023:15:31:52.128013373 +051800] - ERR -
> find_entry_internal_dn - Retry count exceeded (uid=<REDACTED>)
> ===============================================
> 
> Is this message still expected with the fix in place?

This message can happen whenever the database is really busy (or under tuned?).  There is no promise that it will never happen again.  Looking at your log clip it looks like it sporadically happens (once or twice per day).  IMHO this is nothing to be worried about, unless it's happening thousands of times per day, or in large clusters.  

The only debug info I can think of, which will be very hard to "time", is to get a pstack right as the error is happening.  Since it only happens once per day that will make it almost impossible to time it correctly.

Other than that I would check the entry cache tuning, and make sure then entire database is in memory.


Note You need to log in before you can comment on or make changes to this bug.