Bug 975250
Summary: | Changelog deadlock replication failures with DNA | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Venkat Mahadevan <venkmaha> |
Component: | 389-ds-base | Assignee: | Rich Megginson <rmeggins> |
Status: | CLOSED ERRATA | QA Contact: | Sankar Ramalingam <sramling> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | jgalipea, nhosoi, nkinder, rmeggins |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 389-ds-base-1.2.11.15-22.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Under certain conditions, with a mix of concurrent search and update and outgoing replication operations, there will be deadlocks in the changelog db, leading to error messages like this:
NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (XXXXXXX); db error - -30994 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
This is caused by a deadlock between the changelog readers, writers, and main database writers.
Consequence: Update operations will fail with the above error message in the directory server errors log.
Fix: A new configuration parameter is introduced:
dn: cn=config,cn=ldbm database,cn=plugins,cn=config
nsslapd-db-deadlock-policy: 9
With the default policy 9 (DB_LOCK_YOUNGEST), the last locker gets killed when there is a deadlock. In the case that this is the changelog writer, the write will fail, and the entire update will fail.
Users who frequently see the above errors in the errors log are advised to change this setting to 6 (DB_LOCK_MINWRITE) will which instead kill the locker that has the fewest write locks (that is, the changelog reader). The changelog reader code has been changed to handle this deadlock condition and retry. The setting can be changed like this:
ldapmodify -x -D "cn=directory manager" -W <<EOF
dn: cn=config,cn=ldbm database,cn=plugins,cn=config
changetype: modify
replace: nsslapd-db-deadlock-policy
nsslapd-db-deadlock-policy: 6
EOF
You may ask why the default is not changed to 6. The answer is that the setting will apply to _all_ threads, so that changing this setting could cause regular search requests to fail, if the directory server is under a heavy update load. In our testing, we did not see this happen, but we cannot guarantee that changing this value to 6 will not impact regular search requests.
Result: After changing nsslapd-db-deadlock-policy to 6, updates will succeed and no longer cause errors like the above.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-11-21 21:09:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Venkat Mahadevan
2013-06-17 22:30:23 UTC
Upstream ticket: https://fedorahosted.org/389/ticket/47410 *** Bug 979168 has been marked as a duplicate of this bug. *** NOTE: doc text is the same as https://bugzilla.redhat.com/show_bug.cgi?id=979169 1). Configured 2 way MMR with the latest 389-ds-base. 2). Enabled DNA plugin and added DNA entries like this...for M1 dn: cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-pluginEnabled nsslapd-pluginEnabled: On dn: cn=Posix IDs,cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: Posix IDs dnafilter: (|(objectclass=posixAccount)(objectClass=posixGroup)) dnamagicregen: 999 dnamaxvalue: 4294967295 dnanextvalue: 131073 dnascope: dc=passsync,dc=com dnasharedcfgdn: cn=posix-ids,cn=dna,cn=plugins,cn=configuration,ou=ELDAP,ou=Services,dc=passsync,dc=com dnathreshold: 1000 dnatype: uidNumber dnatype: gidNumber 3).Added the following entry to M2... dn: cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-pluginEnabled nsslapd-pluginEnabled: On dn: cn=Posix IDs,cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: Posix IDs dnafilter: (|(objectclass=posixAccount)(objectClass=posixGroup)) dnamagicregen: 999 dnamaxvalue: 0 dnanextvalue: 0 dnascope: dc=passsync,dc=com dnasharedcfgdn: cn=posix-ids,cn=dna,cn=plugins,cn=configuration,ou=ELDAP,ou=Services,dc=passsync,dc=com dnathreshold: 1000 dnatype: uidNumber dnatype: gidNumber 4). Restarted both M1 and M2. 5). Simultaneous add/delete of user entries to M1. dn: uid=bug975250new7666,ou=people,dc=passsync,dc=com telephoneNumber: 989898197666 mail: bug975250new7666 uid: bug975250new7666 givenName: bug975250new7666 objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson objectclass: posixAccount sn: bug975250new7666 cn: bug975250new7666 homeDirectory: /home/bug975250new7666 loginShell: /bin/bash userPassword: {SSHA}aBIV4atRWyMZqiWucSiZgYGVEw1bJa7V uidNumber: 999 gidNumber: 999 6) No error messages encountered while running the tests. Hence, marking the bug as Verified. Checked both M1 and M2 error logs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1653.html |