Bug 972976
| Summary: | ldbm errors when adding/modifying/deleting entries | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Venkat Mahadevan <venkmaha> | ||||
| Component: | 389-ds-base | Assignee: | Rich Megginson <rmeggins> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Sankar Ramalingam <sramling> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.4 | CC: | jgalipea, lnovich, mkubik, nhosoi, nkinder, rmeggins, venkmaha | ||||
| Target Milestone: | rc | Flags: | mkubik:
needinfo+
|
||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | 389-ds-base-1.2.11.15-22.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: The directory server is using replication and DNA. The client does a sequence of ADD/DELETE requests for the same entry over and over again.
Consequence: The directory server will give errors containing "modify_switch_entries failed".
Fix: When replication is used, there is an entry in the directory server that holds the metadata for the operation (the RUV), which is updated and written to the database for each operation. This includes internal operations, such as DNA. The nested DNA operation was overwriting the pointer to the RUV held by the main operation (ADD, DELETE). The fix is for the outer operation to acquire a pointer to this RUV _after_ the inner operation has completed.
Result: There are no more "modify_switch_entries failed" errors when doing many ADD/DELETE operations.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-21 21:08:57 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Sorry, forgot to attach actual error messages: In the error log I see things like: [10/Jun/2013:14:46:43 -0700] - ldbm_back_delete: modify_switch_entries failed [10/Jun/2013:14:50:58 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:50:59 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:50:59 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:50:59 -0700] - ldbm_back_add: modify_switch_entries failed [10/Jun/2013:14:50:59 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:51:00 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:51:00 -0700] - ldbm_back_add: modify_switch_entries failed [10/Jun/2013:14:51:00 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:51:01 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:51:02 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:51:04 -0700] - ldbm_back_modify: modify_switch_entries failed [10/Jun/2013:14:51:04 -0700] - ldbm_back_add: modify_switch_entries failed In the access log I see things like: [10/Jun/2013:14:50:59 -0700] conn=3516 SSL 256-bit AES [10/Jun/2013:14:50:59 -0700] conn=3516 op=0 BIND dn="cn=Directory Manager" method=128 version=3 [10/Jun/2013:14:50:59 -0700] conn=3516 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager" [10/Jun/2013:14:50:59 -0700] conn=3516 op=1 ADD dn="uid=jmeter25,dc=dev,dc=id,dc=ubc,dc=ca" [10/Jun/2013:14:50:59 -0700] conn=3516 op=1 RESULT err=1 tag=105 nentries=0 etime=0 csn=51b64a46001b01f50000 [10/Jun/2013:14:50:59 -0700] conn=3516 op=2 DEL dn="uid=jmeter25,dc=dev,dc=id,dc=ubc,dc=ca" [10/Jun/2013:14:50:59 -0700] conn=3516 op=2 RESULT err=32 tag=107 nentries=0 etime=0 [10/Jun/2013:14:50:59 -0700] conn=3516 op=3 UNBIND [10/Jun/2013:14:50:59 -0700] conn=3516 op=3 fd=107 closed - U1 [10/Jun/2013:14:51:00 -0700] conn=3533 SSL 256-bit AES [10/Jun/2013:14:51:00 -0700] conn=3533 op=0 BIND dn="cn=Directory Manager" method=128 version=3 [10/Jun/2013:14:51:00 -0700] conn=3533 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="cn=directory manager" [10/Jun/2013:14:51:00 -0700] conn=3533 op=1 ADD dn="uid=jmeter57,dc=dev,dc=id,dc=ubc,dc=ca" [10/Jun/2013:14:51:00 -0700] conn=3533 op=1 RESULT err=1 tag=105 nentries=0 etime=0 csn=51b64a46004d01f50000 [10/Jun/2013:14:51:00 -0700] conn=3533 op=2 DEL dn="uid=jmeter57,dc=dev,dc=id,dc=ubc,dc=ca" [10/Jun/2013:14:51:00 -0700] conn=3533 op=2 RESULT err=32 tag=107 nentries=0 etime=0 [10/Jun/2013:14:51:00 -0700] conn=3533 op=3 UNBIND [10/Jun/2013:14:51:00 -0700] conn=3533 op=3 fd=110 closed - U1 Just an update on this. Using a suggestion from Thierry Bordaz here (https://lists.fedoraproject.org/pipermail/389-users/2013-June/016001.html), I increased the nsslapd-cachememsize and nsslapd-dbcachesize from their default values of about 10 megabytes to 1 gigabyte. The database was very small (about 400 entries), but doing this seems to have resolved the issue and my test does not trigger the error anymore. Upstream ticket: https://fedorahosted.org/389/ticket/47392 *** Bug 979170 has been marked as a duplicate of this bug. *** 1). Configured 2 way MMR with the latest 389-ds-base. 2). Enabled DNA plugin, AutoMembers and managedEntry plugins. Added DNA entries like this...to M1 dn: cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-pluginEnabled nsslapd-pluginEnabled: On dn: cn=Posix IDs,cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: Posix IDs dnafilter: (|(objectclass=posixAccount)(objectClass=posixGroup)) dnamagicregen: 999 dnamaxvalue: 4294967295 dnanextvalue: 131073 dnascope: dc=passsync,dc=com dnasharedcfgdn: cn=posix-ids,cn=dna,cn=plugins,cn=configuration,ou=ELDAP,ou=Services,dc=passsync,dc=com dnathreshold: 1000 dnatype: uidNumber dnatype: gidNumber 3).Added the following entry to M2... dn: cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: modify replace: nsslapd-pluginEnabled nsslapd-pluginEnabled: On dn: cn=Posix IDs,cn=Distributed Numeric Assignment Plugin,cn=plugins,cn=config changetype: add objectClass: top objectClass: extensibleObject cn: Posix IDs dnafilter: (|(objectclass=posixAccount)(objectClass=posixGroup)) dnamagicregen: 999 dnamaxvalue: 0 dnanextvalue: 0 dnascope: dc=passsync,dc=com dnasharedcfgdn: cn=posix-ids,cn=dna,cn=plugins,cn=configuration,ou=ELDAP,ou=Services,dc=passsync,dc=com dnathreshold: 1000 dnatype: uidNumber dnatype: gidNumber 4). Restarted both M1 and M2. 5). Simultaneous add/delete/search of user entries to M1. dn: uid=bug975250new7666,ou=people,dc=passsync,dc=com telephoneNumber: 989898197666 mail: bug975250new7666 uid: bug975250new7666 givenName: bug975250new7666 objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetorgperson objectclass: posixAccount sn: bug975250new7666 cn: bug975250new7666 homeDirectory: /home/bug975250new7666 loginShell: /bin/bash userPassword: {SSHA}aBIV4atRWyMZqiWucSiZgYGVEw1bJa7V uidNumber: 999 gidNumber: 999 6) No error messages encountered while running the tests. Hence, marking the bug as Verified. Checked both M1 and M2 error logs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1653.html |
Created attachment 759394 [details] JMeter test plan to reproduce bug Description of problem: The issue I am encountering are the following errors when an entry is being added to the directory or deleted from the directory – it seems an operations error err=1 corresponds to a failure in the backend ldbm (see log snippets below). Version-Release number of selected component (if applicable): 389-ds-base.x86_64 1.2.11.15-14.el6_4 @rhel-x86_64-server-6 Linux 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Feb 20 12:17:37 EST 2013 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.4 (Santiago) How reproducible: Consistently reproducible. Steps to Reproduce: I can reproduce these errors consistently by running a JMeter add/delete loop test to add and delete entries (see the attached testplan_. It always transpires that some entries will fail to add or delete and will throw an operations error err=1 and a corresponding ldbm error. Actual results: err=1 and ldbm errors; entry fails to add or delete. Expected results: Entries should be added and deleted successfully. Additional info: The DNA plugin is active on the server. When sent a uidNumber of 999 or gidNumber of 999, it will generate a new POSIX uid/gid. The servers are in a replicated environment (2 MMR masters and 3 replicas). The error seems to occur regardless of whether the DNA plugin is used or not (i.e. when sending a non-POSIX entry vs. a POSIX entry to LDAP).