Bug 870158
Summary: | slapd entered to infinite loop during new index addition | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Noriko Hosoi <nhosoi> |
Component: | 389-ds-base | Assignee: | Rich Megginson <rmeggins> |
Status: | CLOSED ERRATA | QA Contact: | Sankar Ramalingam <sramling> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 6.4 | CC: | jgalipea, nkinder, nsoman, pspacek, syeghiay |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 389-ds-base-1.2.11.15-3.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: When MMR is under the heavy stress, there was a chance to delete an entry already deleted.
Consequence: It breaks the entry's parent/child relationship.
Fix: This fix adds an missing check if the retrieved entry is already deleted or not.
Result: There is no more chance to break the entry's parent/child relationship.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 08:21:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Noriko Hosoi
2012-10-25 16:40:42 UTC
Committed to the external 389-ds-base-1.2.11 branch. Started to take steps below to verify: 1. install ipa-server 2.2 on a machine using hostname mgmt13.e.org 2. from https://fedorahosted.org/389/ticket/494, downloaded E-ORG-userRoot-2012_10_15_184158.ldif​ 3. import this ldif into the ipa-server # hostname mgmt13.e.org # ipactl stop <all services stopped successfully> # /var/lib/dirsrv/scripts-E-ORG/db2ldif -n userRoot Exported ldif file: /var/lib/dirsrv/slapd-E-ORG/ldif/E-ORG-userRoot-2012_11_29_151133.ldif [29/Nov/2012:15:11:33 -0500] - /etc/dirsrv/slapd-E-ORG/dse.ldif: nsslapd-maxdescriptors: nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 4096 (the current process limit). Server will use a setting of 4096. [29/Nov/2012:15:11:33 -0500] - Config Warning: - nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 4096 (the current process limit). Server will use a setting of 4096. ldiffile: /var/lib/dirsrv/slapd-E-ORG/ldif/E-ORG-userRoot-2012_11_29_151133.ldif [29/Nov/2012:15:11:33 -0500] - export userRoot: Processed 224 entries (100%). [29/Nov/2012:15:11:33 -0500] - All database threads now stopped Ques: Should I be running the command, and if yes - should it be as above? The steps I plan to take to verify this: After importing the data, upgrade IPA to 3.0, and verify it upgrades successfully. First of all - the problem is/was a race condition, so any test results can't be taken as definitive ... This problem can potentially be reproduced with following files: - Import following DNS zone to IPA: https://fedorahosted.org/389/attachment/ticket/494/e.org.ldif - Download following scripts to some directory: https://fedorahosted.org/389/attachment/ticket/494/loop.sh https://fedorahosted.org/389/attachment/ticket/494/nsupdate_add_del_gen.py - Modify NPROC value in loop.sh to 10 or so - kinit admin - run `bash loop.sh` - let it running for several hours or before the script finishes These scripts try to do chaotically 10000*NPROC DNS updates in DNS zone e.org. The whole thing runs in NPROC processes in parallel. It should put 389 and BIND under heavy load and hopefully it will triger the right condition. Nothing is sure... I would recommend to use latest version of IPA and bind-dyndb-ldap (ideally version which goes to RHEL 6.4). Persistent search and serial_autoincrement should be enabled in /etc/named.conf before the test. Both of them should be enabled in default IPA 3.0 install, please, check it to be sure. Expected results: - zone e.org contain some DNS names like c<number> - associated TXT records have randomly generated values - BIND not crashed during the whole test (please let me know if it thows some error messages or somethig suspicious) - 389 is able to export the database Ran the script. Didn't see any hangs or errors. /var/log/messages had messages like: Nov 30 16:40:36 cloud-qe-10 named[1634]: update_record (psearch) failed, dn 'idnsName=c55,idnsName=e.org,cn=dns,dc=testrelm,dc=com' change type 0x1. Records can be outdated, run `rndc reload`: not found Went through list of expected results as listed above - Zone e.org contained DNS names like c<number> Associated TXT records had randomly generated values There were 100 entries, and an example: Record name: c96 TXT record: "19" # service named status indicated it was still running todo: 389 is able to export db (In reply to comment #10) > Didn't see any hangs or errors. /var/log/messages had messages like: > Nov 30 16:40:36 cloud-qe-10 named[1634]: update_record (psearch) failed, dn > 'idnsName=c55,idnsName=e.org,cn=dns,dc=testrelm,dc=com' change type 0x1. > Records can be outdated, run `rndc reload`: not found This particular error message is acceptable. Records are added and deleted too quickly and BIND is not able to process them at full speed. To be more specific: 1) Some record is added to LDAP 2) BIND receives change notification and saves this notification to internal "to be processed" queue 3) The same record is deleted from LDAP 4) BIND pulls the notification from internal queue and tries to load new data from LDAP - but there are no data anymore > todo: 389 is able to export db This will break the tie. was able to export the db: # /var/lib/dirsrv/scripts-TESTRELM-COM/db2ldif -a /tmp/output.ldif -s "dc=testrelm,dc=com" Exported ldif file: /tmp/output.ldif [03/Dec/2012:08:24:40 -0500] - /etc/dirsrv/slapd-TESTRELM-COM/dse.ldif: nsslapd-maxdescriptors: nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 4096 (the current process limit). Server will use a setting of 4096. [03/Dec/2012:08:24:40 -0500] - Config Warning: - nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 4096 (the current process limit). Server will use a setting of 4096. [03/Dec/2012:08:24:40 -0500] - Backend Instance(s): [03/Dec/2012:08:24:40 -0500] - userRoot ldiffile: /tmp/output.ldif [03/Dec/2012:08:24:43 -0500] - export userRoot: Processed 365 entries (100%). [03/Dec/2012:08:24:44 -0500] - All database threads now stopped # ls -l /tmp/output.ldif -rw------- 1 dirsrv dirsrv 239692 Dec 3 08:24 /tmp/output.ldif verfied using: ipa-server-3.0.0-8.el6.x86_64 389-ds-base-1.2.11.15-4.el6.x86_64 bind-dyndb-ldap-2.3-1.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0503.html |