Bug 772779

Summary: bak2db gets stuck in infinite loop
Product: Red Hat Enterprise Linux 6 Reporter: Rich Megginson <rmeggins>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: amsharma, jgalipea
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.10.0-1.el6 Doc Type: Bug Fix
Doc Text:
Cause: Doing a restore from a database backup. Consequence: The restore command would hang. Fix: The restore command could get into an infinite loop in certain conditions. The code was rewritten to remove the infinite loop code path. Result: Database restores will not hang.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 07:11:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rich Megginson 2012-01-09 23:09:11 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/4

Reported on http://lists.fedoraproject.org/pipermail/389-users/2012-January/013944.html

In my environment I have a total of 4 directory servers, 2 multi-masters in production (ServerA, ServerB) and 2 multi-masters to test with (ServerC, ServerD).  Basically here’s what I did:

Took a backup of one of the production directory servers, ServerA

Copied ServerA’s backup to ServerC (test).

Deleted the replication agreement on ServerC to ServerD (but not the agreement from ServerD to ServerC)

Ran /usr/lib64/dirsrv/slapd-ServerC/bak2db 2011_12_29_15_27_35

The restore started, and never stopped running.  I eventually killed it and tried again, this time capturing the output:
# /usr/lib64/dirsrv/slapd-ServerC/bak2db 2011_12_29_15_27_35

[03/Jan/2012:15:06:43 -0700] 389-Directory/1.2.9.9 - debug level: backend (524288)

[03/Jan/2012:15:06:43 -0700] - Deleting log file: (/var/lib/dirsrv/slapd-ServerC/db/log.0000000021)

[03/Jan/2012:15:06:43 -0700] - Restoring file 1 (/var/lib/dirsrv/slapd-ServerC/db/DBVERSION)

[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/DBVERSION to /var/lib/dirsrv/slapd-ServerC/db/DBVERSION

[03/Jan/2012:15:06:43 -0700] - Restoring file 2 (/var/lib/dirsrv/slapd-ServerC/db/log.0000000021)

[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/log.0000000021 to /var/lib/dirsrv/slapd-ServerC/db/log.0000000021

[ lines removed to reduce size ]

[03/Jan/2012:15:06:43 -0700] - Restoring file 33 (/var/lib/dirsrv/slapd-ServerC/db/userRoot/uid.db4)

[03/Jan/2012:15:06:43 -0700] - Copying /var/lib/dirsrv/slapd-ServerC/bak/2011_12_29_15_27_35/userRoot/uid.db4 to /var/lib/dirsrv/slapd-ServerC/db/userRoot/uid.db4

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=entryrdn,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=entryrdn,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nscpEntryDN,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nscpEntryDN,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsds5ReplConflict,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsds5ReplConflict,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsuniqueid,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=nsuniqueid,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=numsubordinates,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=numsubordinates,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=objectclass,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=objectclass,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=parentid,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=parentid,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config

[03/Jan/2012:15:06:43 -0700] - Del Index Config Entry cn=aci,cn=index,cn=userRoot,cn=ldbm database,cn=plugins,cn=config

That is, after the message about deleting cn=parentid, it starts over again with cn=aci, skipping the other default indexes cn=seealso and cn=sn and cn=telephoneNumber and cn=uid and cn=uniquemember

389-ds-base-1.2.9.9-1.el5
RedHat EL 5.5

Comment 2 Rich Megginson 2012-04-19 16:28:28 UTC
svn ci -m "test for bug772779 - bak2db gets stuck in infinite loop"
Sending        clu/cluUtil.sh
Transmitting file data .
Committed revision 6473.

Comment 3 Amita Sharma 2012-05-17 10:57:39 UTC
############## Result  for  backend test :   Clu run
    Clu run elapse time : 00:12:58
    Clu run Tests  FAIL       : 2% (7/261)
    Clu run Tests  NORESULT       : 0% (1/261)
    Clu run Tests  PASS       : 96% (253/261)


Which test case is for this bug?

Comment 4 Rich Megginson 2012-05-17 14:32:59 UTC
(In reply to comment #3)
> ############## Result  for  backend test :   Clu run
>     Clu run elapse time : 00:12:58
>     Clu run Tests  FAIL       : 2% (7/261)
>     Clu run Tests  NORESULT       : 0% (1/261)
>     Clu run Tests  PASS       : 96% (253/261)
> 
> 
> Which test case is for this bug?

In the TET RHEL63 branch - the file clu/cluUtil.sh - the test cluUtil_restore_no_hang (ic6)

Comment 5 Amita Sharma 2012-05-21 09:06:13 UTC
I have executed only ic6 for verifying this bug and it passed.
So marking this bug as VERIFIED.

Comment 6 Rich Megginson 2012-05-24 22:37:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Doing a restore from a database backup.
Consequence: The restore command would hang.
Fix: The restore command could get into an infinite loop in certain conditions.  The code was rewritten to remove the infinite loop code path.
Result: Database restores will not hang.

Comment 7 errata-xmlrpc 2012-06-20 07:11:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0813.html