Bug 1812286

Summary: RFE - Monitor the current DB locks ( nsslapd-db-current-locks ).
Product: Red Hat Enterprise Linux 8 Reporter: Têko Mihinto <tmihinto>
Component: 389-ds-baseAssignee: Simon Pichugin <spichugi>
Status: CLOSED ERRATA QA Contact: RHDS QE <ds-qe-bugs>
Severity: medium Docs Contact: Marc Muehlfeld <mmuehlfe>
Priority: unspecified    
Version: 8.2CC: aakkiang, bsmejkal, ekasprzy, mharmsen, mreynolds, msauton, ofalk, pasik, sgouvern, spichugi, tbordaz, vashirov
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-1.4-8050020210531183345.1a75f91c Doc Type: Enhancement
Doc Text:
.Directory Server provides monitoring settings that can prevent database corruption caused by lock exhaustion This update adds the `nsslapd-db-locks-monitoring-enable` parameter to the `cn=bdb,cn=config,cn=ldbm database,cn=plugins,cn=config` entry. If it is enabled, which is the default, Directory Server aborts all of the searches if the number of active database locks is higher than the percentage threshold configured in `nsslapd-db-locks-monitoring-threshold`. If an issue is encountered, the administrator can increase the number of database locks in the `nsslapd-db-locks` parameter in the `cn=bdb,cn=config,cn=ldbm database,cn=plugins,cn=config` entry. This can prevent data corruption. Additionally, the administrator now can set a time interval in milliseconds that the thread sleeps between the checks. For further details, see the parameter descriptions in the link:https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/configuration_command_and_file_reference/index[Red Hat Directory Server Configuration, Command, and File Reference].
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 18:10:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Têko Mihinto 2020-03-10 22:40:13 UTC
Description of problem:
There could shortage of the DB locks and in some cases, it could cause DB corruption issues.

Version-Release number of selected component (if applicable):
RHDS 10.

How reproducible:
A few times.

Steps to Reproduce:
Run LDAP operations that lead to an exhaustion of the DB locks.

Actual results:
Possible DB issues that could require a full reinitialization.

Expected results:
Monitor the value of the "nsslapd-db-current-locks" attribute and maybe shut the RHDS instance down.

Additional info:

Comment 3 thierry bordaz 2020-06-18 08:42:25 UTC
db lock gets exhausted because of unindexed internal searches (under a transaction). Indexing those searches is the way to prevent exhaustion.
To prevent db lock exhaustion and help admin task a possible solutions would be:

- If db lock get exhausted during a txn, it leads to db panic and the later recovery can possibly fail. That leads to a full reinit of the instance where the db locks got exhausted. The server should monitor the db lock and trigger server shutdown (similar to disk full) if the db lock is close to be exhausted. Because of the performance impact, the monitoring should be limited to unindexed (allid(candidate)) internal searches (under a txn) and periodically (after each 1000 evaluated candidate). unindexed should be flagged in ldbm_back_search. transaction can be tested with pblock(SLAPI_TXN), monitoring should be done in iterate, internal op is an operation flag OP_FLAG_INTERNAL.

- To help indexing the appropriate attributes, unindexed internal search (under txn) should log a warning with the search filter.

- a config parameter should toggle monitoring/shutdown. By default it should be enabled. 

- Monitoring returns value that may be not exact. The threshold to trigger the shutdown should take into account that the value is not perfect.

Comment 9 Simon Pichugin 2021-04-20 12:19:55 UTC
*** Bug 1831812 has been marked as a duplicate of this bug. ***

Comment 17 bsmejkal 2021-06-02 14:19:25 UTC
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.6.8, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
metadata: {'Python': '3.6.8', 'Platform': 'Linux-4.18.0-308.el8.x86_64-x86_64-with-redhat-8.5-Ootpa', 'Packages': {'pytest': '6.2.4', 'py': '1.10.0', 'pluggy': '0.13.1'}, 'Plugins': {'metadata': '1.11.0', 'html': '3.1.1', 'libfaketime': '0.1.2', 'flaky': '3.7.0'}}
389-ds-base: 1.4.3.23-2.module+el8.5.0+11209+cb479c8d
nss: 3.53.1-17.el8_3
nspr: 4.25.0-2.el8_2
openldap: 2.4.46-16.el8
cyrus-sasl: 2.1.27-5.el8
FIPS: disabled
rootdir: /mnt/tests/rhds/tests/upstream/ds/dirsrvtests, configfile: pytest.ini
plugins: metadata-1.11.0, html-3.1.1, libfaketime-0.1.2, flaky-3.7.0
collected 4 items                                                                                                                                                                                                                            

dirsrvtests/tests/suites/monitor/db_locks_monitor_test.py::test_exhaust_db_locks_basic[70] PASSED                                                                                                                                      [ 25%]
dirsrvtests/tests/suites/monitor/db_locks_monitor_test.py::test_exhaust_db_locks_basic[80] PASSED                                                                                                                                      [ 50%]
dirsrvtests/tests/suites/monitor/db_locks_monitor_test.py::test_exhaust_db_locks_basic[95] PASSED                                                                                                                                      [ 75%]
dirsrvtests/tests/suites/monitor/db_locks_monitor_test.py::test_exhaust_db_locks_big_pause PASSED                                                                                                                                      [100%]

======================================================================================================= 4 passed in 1009.19s (0:16:49) =======================================================================================================

Marking as Verified:Tested.

Comment 20 sgouvern 2021-06-03 12:27:20 UTC
As per comment 17, marking as VERIFIED

Comment 27 errata-xmlrpc 2021-11-09 18:10:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4203