Bug 947583 - ldapdelete returns non-leaf entry error while trying to remove a leaf entry
Summary: ldapdelete returns non-leaf entry error while trying to remove a leaf entry
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: 389-ds-base
Version: 6.5
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Sankar Ramalingam
URL:
Whiteboard:
: 982326 1012942 (view as bug list)
Depends On:
Blocks: 1013063 1053766
TreeView+ depends on / blocked
 
Reported: 2013-04-02 19:16 UTC by Najmuddin Chirammal
Modified: 2018-12-09 16:58 UTC (History)
11 users (show)

Fixed In Version: 389-ds-base-1.2.11.15-22.el6 389-ds-base-1.2.11.15-26.el6
Doc Type: Bug Fix
Doc Text:
Cause: The replication conflict resolution code had issues such as in conflict DN generation, retrieving deleted parent entry, and examining the scope of a deleted entry. Consequence: These issues confused the parent - children relationship and accidentally created an intermediate node entry which has a positive child count but has no children. The server refused to remove such an entry. Fix: The cause issues were fixed. Result: The replication conflict resolution code is more robust and does not get confused under the stress test.
Clone Of:
: 1053766 (view as bug list)
Environment:
Last Closed: 2013-11-21 21:06:46 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1653 normal SHIPPED_LIVE 389-ds-base bug fix update 2013-11-20 21:53:19 UTC

Description Najmuddin Chirammal 2013-04-02 19:16:17 UTC
Description of problem: 

Unable to remove an ou from the directory server, Looking at the ldapsearch outputs and the db2ldif dump I can confirm there are no child entries, but the server does not allow me to remove this particular entry.. it returns err 66 (non-leaf entry).

# ldapdelete -x -D cn=Directory\ manager -w XXXXX  -r ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com
ldap_delete: Operation not allowed on non-leaf (66)

Version-Release number of selected component (if applicable): 389-ds-base-1.2.10.2-20.el6_3.x86_64

How reproducible: Unknown

Steps to Reproduce:
1. Setup multi-master DS environment
2. create user/group entries, run the server for few days?
3. try to remove an ou, this might reproduce the issue (return non-leaf entry) if you are lucky :)
  
Actual results: Directory server returns non-leaf entry while trying to remove a leaf entry.

Expected results: Since there are no entries under ou=people, the entry get removed and no non-leaf error returned.


Additional info:

* The error: 
# ldapdelete -x -D cn=Directory\ manager -w XXXX  -r ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com
ldap_delete: Operation not allowed on non-leaf (66)

* stopped ldap server.
[root@vm190 ~]# service dirsrv stop
Shutting down dirsrv: 
    vm190...                                               [  OK  ]
* captured current ldif with -r option.

[root@vm190 ~]# /usr/lib64/dirsrv/slapd-vm190/db2ldif  -r -n userRoot
Exported ldif file: /var/lib/dirsrv/slapd-vm190/ldif/vm190-userRoot-2013_04_02_142032.ldif
ldiffile: /var/lib/dirsrv/slapd-vm190/ldif/vm190-userRoot-2013_04_02_142032.ldif
[02/Apr/2013:14:20:32 -0400] - export userRoot: Processed 15 entries (100%).
[02/Apr/2013:14:20:32 -0400] - Waiting for 4 database threads to stop
[02/Apr/2013:14:20:33 -0400] - All database threads now stopped

* searched for entries that has any reference to ou=people.

[root@vm190 ~]# grep -A 1 -B 1 -i ou=people /var/lib/dirsrv/slapd-vm190/ldif/vm190-userRoot-2013_04_02_142032.ldif
 dc=gsslab,dc=pnq,dc=redhat,dc=com") (version 3.0;acl "ldap-admin-access";allo
 w (all)(userdn = "ldap:///uid=ladmin,ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=
 com");)
--
uniqueMember;adcsn-514e1ff30000014d0000;vucsn-514e1b1d0000014d0000;vdcsn-514e1f820000014d0000;deletedattribute;deleted: 
 uid=ffas,ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com
uniqueMember;vucsn-514e1d420000014d0000;vdcsn-514e1d340000014d0000;deletedattribute;deleted: 
 uid=t1,ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com

--
 ory manager
uniqueMember;vucsn-514e19270000014d0000: uid=chettan,ou=people,dc=gsslab,dc=pn
 q,dc=redhat,dc=com
--
# entry-id: 6
dn: ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com
nsUniqueId: 84697a02-0f0e11e2-81dbd582-df1e031c
--
 ory manager
uniqueMember;vucsn-514e20600000014d0000: uid=ps,ou=people,dc=gsslab,dc=pnq,dc=
 redhat,dc=com

* Number of subordinates from ldapsearch..
[root@vm190 ~]# ldapsearch -LLL -x -b ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com -D cn=directory\ manager -w XXXXX hassubordinates numsubordinates tombstonenumsubordinates
dn: ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com
hassubordinates: TRUE
numsubordinates: 8
tombstonenumsubordinates: 0

* ldapsearch output for ou=people, this shows only the base (ie, leaf ) entry.

[root@vm190 ~]# ldapsearch -LLL -x -b ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com -D cn=directory\ manager -w XXXX  '(|(objectclass=*)(objectclass=nstombstone)(objectclass=ldapsubentry))'
dn: ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com
objectClass: top
objectClass: organizationalunit
ou: people
postalCode: 673643

Note: I had few COS and ppolicy entries under this ou, all of them are removed sometime ago.

Comment 1 Noriko Hosoi 2013-04-15 18:46:48 UTC
Unfortunately, we have no luck to reproduce the problem, so far.  Obviously, there's some bug in handling numsubordinates and hassubordinates.

Did you have any chance to notice anything unusual when the problem was observed?  For instance, any errors or warnings in the error log?  Or do you see any conflicts in the replication?

We'd like to analyse the operations you did on the subtree "ou=people".

When the problem occurred, could you dump the index files as well as the changelog file?
cd /var/lib/dirsrv/slapd-<ID>
dbscan -f db/<backend>/id2entry.db#
dbscan -f db/<backend>/entryrdn.db#
dbscan -f db/<backend>/numsubordinates.db# -r -n
dbscan -f changelogdb/*.db#
and attach the output to the bug?

Thank you,
--noriko

Comment 3 Najmuddin Chirammal 2013-04-16 09:57:27 UTC
AFAIR, there were no errors, currently the replication is not happening as the other node is not up/reachable, I'll go through the logs and update you if I find any errors/warnings.

Comment 4 Noriko Hosoi 2013-04-16 17:53:42 UTC
Najmuddin, thank you for attaching the files.

In addition, could you please attach the config file (dse.ldif)?  I'm particularly interested in the plugin configuration, such as, memberof and referential integrity plugins.

Comment 6 Noriko Hosoi 2013-05-17 01:40:49 UTC
Najmuddin, I've been trying to duplicate the bug, but still no luck...

You reported you are running:
Version-Release number of selected component (if applicable): 389-ds-base-1.2.10.2-20.el6_3.x86_64

Our latest version for RHEL6.4 is 389-ds-base-1.2.11.15-14.el6_4 (https://brewweb.devel.redhat.com/buildinfo?buildID=266203).  Could it be possible to upgrade your 389-ds-base to this version and run the test?

Thanks,
--noriko

Comment 7 Venkat Mahadevan 2013-05-21 17:31:12 UTC
Hello,

Just wanted to add that I too have experienced this particular bug under RHEL 6.4 with 389-ds-base.x86_64 1.2.11.15-12.el6_4 
It only occurred on 3 servers out of 5 in a multi-master replication environment. I had to re-initialize 3 of the servers in order to get rid of the OU entry that would not delete (Operation not allowed on non-leaf (66)).

Kind regards,

VM

Comment 8 Noriko Hosoi 2013-05-21 19:27:10 UTC
Thank you for the input, Venkat.

Do you think the symptom is MMR topology dependent?  I'm testing only with 2 masters.  How do you configure the 5 masters?  Every master has agreements to the other 4 masters?  Or with less agreements?

I think Najmuddin mentioned no errors were logged in the error log.  You didn't see any naming conflict reports, either?

Thanks,
--noriko

Comment 9 Venkat Mahadevan 2013-05-21 22:28:30 UTC
Hi Noriko,

No problem, glad if I can provide any info. Here is how the replication is
setup (each master is called eldap1 -- eldap5) and has a replication agreement
with each of the other 4 masters over LDAPS/636.

server     has replication agreement with
-----------------------------------------
eldap1 --> eldap2, eldap3, eldap4, eldap5
eldap2 --> eldap1, eldap3, eldap4, eldap5
eldap3 --> eldap1, eldap2, eldap4, eldap5
eldap4 --> eldap1, eldap2, eldap3, eldap5
eldap5 --> eldap1, eldap2, eldap3, edlap4

I did see some errors like the following in the error log:

errors.20130514-130536-[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): replay_update: Consumer successfully sent operation with csn 5195506f000601f50000
errors.20130514-130536-[16/May/2013:14:32:29 -0700] agmt="cn=eldap3" (eldap3:636) - clcache_load_buffer: rc=-30988
errors.20130514-130536-[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): No more updates to send (cl5GetNextOperationToReplay)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_waitfor_async_results: 1850 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1850, (null)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1453, (null)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1454
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1454, (null)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 66, 1, 1455, (null)
errors.20130514-130536:[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): Consumer failed to replay change (uniqueid e9e75aa6-a2e511e2-850beb61-1d38f3ef, CSN 5195506f000601f50000): Operation not allowed on non-leaf (66). Skipping.
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1211
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1851
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1851, (null)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1852
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1852, (null)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1211
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - conn=10572 op=155 Acquired consumer connection extension
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: result 2, 66, 1, 1853, (null)
errors.20130514-130536-[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - conn=10572 op=155 repl="dc=dev,dc=id,dc=ubc,dc=ca": Released replica held by locking_purl=conn=10572 id=154
errors.20130514-130536-[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - conn=10572 op=155 Relinquishing consumer connection extension
errors.20130514-130536:[16/May/2013:14:32:29 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): Consumer failed to replay change (uniqueid e9e75aa6-a2e511e2-850beb61-1d38f3ef, CSN 5195506f000601f50000): Operation not allowed on non-leaf (66). Skipping.
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1211
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1211
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1455
errors.20130514-130536-[16/May/2013:14:32:29 -0700] - repl5_inc_result_threadmain: read result for message_id 1853
--
errors.20130514-130536-[16/May/2013:14:32:31 -0700] NSMMReplicationPlugin - changelog program - agmt="cn=eldap4" (eldap4:636): CSN 5195506f000501f50000 found, position set for replay
errors.20130514-130536-[16/May/2013:14:32:31 -0700] agmt="cn=eldap4" (eldap4:636) - load=1 rec=1 csn=5195506f000601f50000
errors.20130514-130536-[16/May/2013:14:32:31 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): replay_update: Sending delete operation (dn="ou=People,dc=dev,dc=id,dc=ubc,dc=ca" csn=5195506f000601f50000)
errors.20130514-130536-[16/May/2013:14:32:31 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): replay_update: Consumer successfully sent operation with csn 5195506f000601f50000
errors.20130514-130536-[16/May/2013:14:32:31 -0700] agmt="cn=eldap4" (eldap4:636) - clcache_load_buffer: rc=-30988
errors.20130514-130536-[16/May/2013:14:32:31 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): No more updates to send (cl5GetNextOperationToReplay)
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_waitfor_async_results: 0 1458
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain starting
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1856
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: result 2, 66, 1, 1856, (null)
errors.20130514-130536:[16/May/2013:14:32:31 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): Consumer failed to replay change (uniqueid e9e75aa6-a2e511e2-850beb61-1d38f3ef, CSN 5195506f000601f50000): Operation not allowed on non-leaf (66). Skipping.
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1856
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1458
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1214
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1214, (null)
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1214
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1856
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: result 2, 66, 1, 1458, (null)
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1214
errors.20130514-130536:[16/May/2013:14:32:31 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): Consumer failed to replay change (uniqueid e9e75aa6-a2e511e2-850beb61-1d38f3ef, CSN 5195506f000601f50000): Operation not allowed on non-leaf (66). Skipping.
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1856
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1458
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1214
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1458
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1856
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1214
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1458
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1458
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1856
errors.20130514-130536-[16/May/2013:14:32:31 -0700] - repl5_inc_result_threadmain: read result for message_id 1214
--
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): replay_update: Consumer successfully sent operation with csn 5195506f000601f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap4" (eldap4:636) - load=1 rec=2 csn=51955078000001f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): replay_update: Sending delete operation (dn="ou=Special Users,dc=dev,dc=id,dc=ubc,dc=ca" csn=51955078000001f50000)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): replay_update: Consumer successfully sent operation with csn 51955078000001f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap4" (eldap4:636) - clcache_load_buffer: rc=-30988
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): No more updates to send (cl5GetNextOperationToReplay)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_waitfor_async_results: 0 1462
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain starting
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: read result for message_id 1461
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: result 2, 66, 1, 1461, (null)
errors.20130514-130536:[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap4" (eldap4:636): Consumer failed to replay change (uniqueid e9e75aa6-a2e511e2-850beb61-1d38f3ef, CSN 5195506f000601f50000): Operation not allowed on non-leaf (66). Skipping.
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: read result for message_id 1462
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1462, (null)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: read result for message_id 1462
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap3" (eldap3:636) - clcache_get_buffer: _pool is ea4610 _pool->pl_busy_lists is 7fd5400545e0 _pool->pl_busy_lists->bl_buffers is 7fd54407dec0
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap3" (eldap3:636) - session start: anchorcsn=5195506f000501f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - changelog program - agmt="cn=eldap3" (eldap3:636): CSN 5195506f000501f50000 found, position set for replay
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap3" (eldap3:636) - load=1 rec=1 csn=5195506f000601f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): replay_update: Sending delete operation (dn="ou=People,dc=dev,dc=id,dc=ubc,dc=ca" csn=5195506f000601f50000)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): replay_update: Consumer successfully sent operation with csn 5195506f000601f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap3" (eldap3:636) - load=1 rec=2 csn=51955078000001f50000
--
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap2" (eldap2:636) - load=1 rec=1 csn=51955078000001f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap2" (eldap2:636): replay_update: Sending delete operation (dn="ou=Special Users,dc=dev,dc=id,dc=ubc,dc=ca" csn=51955078000001f50000)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap2" (eldap2:636): replay_update: Consumer successfully sent operation with csn 51955078000001f50000
errors.20130514-130536-[16/May/2013:14:32:38 -0700] agmt="cn=eldap2" (eldap2:636) - clcache_load_buffer: rc=-30988
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap2" (eldap2:636): No more updates to send (cl5GetNextOperationToReplay)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_waitfor_async_results: 0 1217
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain starting
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: read result for message_id 1462
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: read result for message_id 1859
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: result 2, 66, 1, 1859, (null)
errors.20130514-130536:[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - agmt="cn=eldap3" (eldap3:636): Consumer failed to replay change (uniqueid e9e75aa6-a2e511e2-850beb61-1d38f3ef, CSN 5195506f000601f50000): Operation not allowed on non-leaf (66). Skipping.
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: read result for message_id 1860
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - repl5_inc_result_threadmain: result 2, 0, 0, 1860, (null)
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - conn=10572 op=157 Acquired consumer connection extension
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - conn=10572 op=157 repl="dc=dev,dc=id,dc=ubc,dc=ca": Begin incremental protocol
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - csngen_adjust_time: gen state before 519550790004:1368739958:1:2
errors.20130514-130536-[16/May/2013:14:32:38 -0700] - csngen_adjust_time: gen state after 5195507a0001:1368739958:2:2
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - conn=10572 op=157 repl="dc=dev,dc=id,dc=ubc,dc=ca": Replica in use locking_purl=conn=10573 id=1018
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - conn=10572 op=157 replica="dc=dev,dc=id,dc=ubc,dc=ca": Unable to acquire replica: error: replica busy locked by conn=10573 id=1018 for incremental update
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - conn=10572 op=157 repl="dc=dev,dc=id,dc=ubc,dc=ca": StartNSDS90ReplicationRequest: response=1 rc=0
errors.20130514-130536-[16/May/2013:14:32:38 -0700] NSMMReplicationPlugin - conn=10572 op=157 Relinquishing consumer connection extension

Comment 10 Noriko Hosoi 2013-05-23 22:54:20 UTC
Finally, I could reproduce the problem.  The key is naming conflict.

This is an tombstone entry.  It says it has 3 ordinary child entries and 9 tombstoned child entries.
dn: nsuniqueid=1965a88d-c3f611e2-935ac747-051f73f9,dc=gsslab,dc=pnq,dc=redhat,
 dc=com
numsubordinates: 3
hassubordinates: TRUE
tombstonenumsubordinates: 9

But actually, all of them were turned to be tombstone entries in my tests.  If I delete all the tombstoned child entries, then the entry has no child, but its numsubordinates value is 3.
dn: nsuniqueid=1965a88d-c3f611e2-935ac747-051f73f9,dc=gsslab,dc=pnq,dc=redhat,
 dc=com
numsubordinates: 3
hassubordinates: TRUE
tombstonenumsubordinates: 0

Now attempting to delete the entry fails with "Operation not allowed on nonleaf".
# ldapdelete ...
nsuniqueid=1965a88d-c3f611e2-935ac747-051f73f9,dc=gsslab,dc=pnq,dc=redhat,dc=com
ldap_delete: Operation not allowed on nonleaf

Comment 11 Noriko Hosoi 2013-05-23 23:10:10 UTC
Upstream ticket:
https://fedorahosted.org/389/ticket/47367

Comment 12 Venkat Mahadevan 2013-05-24 17:05:48 UTC
Thank you for the explanation Noriko, that's quite helpful in understanding the issue. 

cheers,

VM

Comment 13 Noriko Hosoi 2013-05-24 17:18:25 UTC
Thank YOU for your patience.  

To double check we are sharing the same issue, could you run this command line when you run into the non-leaf entry problem next time?
$ ldapsearch [...] -b "ou=people,dc=gsslab,dc=pnq,dc=redhat,dc=com" "(|(objectclass=*)(objectclass=nstombstone))"  numsubordinates tombstoneNumSubordinates hassubordinates nsds5ReplConflict

Thanks!

Comment 14 Venkat Mahadevan 2013-05-24 21:41:53 UTC
No problem, glad if I can help in any way. I will run that query and post the result if I get the non-leaf entry problem again.

cheers,

VM

Comment 19 Rich Megginson 2013-07-10 16:16:26 UTC
*** Bug 982326 has been marked as a duplicate of this bug. ***

Comment 21 Sankar Ramalingam 2013-08-18 08:10:18 UTC
Hi Noriko,
Is it consistently reproducible? Is yes, can you give us steps or add automated tests in TET?

Comment 22 Noriko Hosoi 2013-08-19 18:39:47 UTC
(In reply to Sankar Ramalingam from comment #21)
> Hi Noriko,
> Is it consistently reproducible? Is yes, can you give us steps or add
> automated tests in TET?

Here's what I did:
1) set up 3-way MMR
2) prepare an ldif file containing (add, modify and delete operations.  Note: you need to have enough amount of operations.  My test case has 30 adds, 30 modifies and 30 deletes.)
3) run the "ldapmodify -f <ldif_created_in_2>" against the 3 masters at the same time and generate conflicts.
4) delete leaves, then the intermediate nodes.  If all the intermediate nodes are deleted successfully, the bug is verified.

Comment 23 Amita Sharma 2013-09-26 10:10:56 UTC
[root@ibm-x3650m4-01-vm-11 export]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.5 Beta (Santiago)

Followed steps in https://bugzilla.redhat.com/show_bug.cgi?id=947583#c22 and I could delete nodes without any fail::
M1 - 1189
M2 - 1289
[root@ibm-x3650m4-01-vm-11 export]# ldapdelete -x -h localhost -p 1189 -D "cn=Directory Manager" -w Secret123 -r ou=groups,dc=passsync,dc=com
[root@ibm-x3650m4-01-vm-11 export]# ldapdelete -x -h localhost -p 1289 -D "cn=Directory Manager" -w Secret123 -r ou=groups,dc=passsync,dc=com
ldap_search: No such object (32)
ldap_delete: No such object (32)

[root@ibm-x3650m4-01-vm-11 export]# ldapdelete -x -h localhost -p 1189 -D "cn=Directory Manager" -w Secret123 -r ou=people,dc=passsync,dc=com
[root@ibm-x3650m4-01-vm-11 export]# ldapdelete -x -h localhost -p 1289 -D "cn=Directory Manager" -w Secret123 -r ou=people,dc=passsync,dc=com
ldap_search: No such object (32)
ldap_delete: No such object (32)


Hence marking it VERIFIED.

Comment 24 Nathan Kinder 2013-09-27 18:49:25 UTC
Reopening this bug, as QE encountered a regression introduced by the fix for this bug.  Here are the details on the regression as encountered by Ján Rusnačko from QE:

Description of problem:
Replication monitor searches for RUV tombstone entries to get values such as maxCSN. These searches fail to return any entries and replication monitor generates empty report.

When line 368 of repl-monitor.pl 

    $ruv = $conn->search($replicaroot, "one",
         "(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectClass=nsTombstone))",
	 0, qw(nsds50ruv nsruvReplicaLastModified));

is changed to 
 
    $ruv = $conn->search($replicaroot, "sub",
         "(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectClass=nsTombstone))",
	 0, qw(nsds50ruv nsruvReplicaLastModified));

replication monitor generates full report. Indeed search 

[root@hp-dl380pgen8-02-vm-1 BE_0]# ldapsearch ... -b "o=my_suffix.com" "(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectClass=nsTombstone))" -s one

returns no entries, while 

[root@hp-dl380pgen8-02-vm-1 BE_0]# ldapsearch ... -b "o=my_suffix.com" "(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectClass=nsTombstone))" -s sub

returns tombstone entries with RUV information.

Comment 25 Nathan Kinder 2013-09-27 18:51:18 UTC
*** Bug 1012942 has been marked as a duplicate of this bug. ***

Comment 27 Noriko Hosoi 2013-09-27 21:56:58 UTC
Npte: 389-ds-base-1.2.11.15-26.el6 contains the fix for bz 1012942 which was introduced by fixing this original bug.

Comment 28 Ján Rusnačko 2013-09-30 13:22:38 UTC
With 389-ds-base-1.2.11.15-26.el6.x86_64:

[jrusnack@dhcp131-10 /]$ ldapsearch -h 192.168.122.88 -p 22221 -D"cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" "(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectClass=nsTombstone))" -LLL -s sub

dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,ou=foo,dc=example,dc=com
objectClass: top
objectClass: nsTombstone
objectClass: extensibleobject
nsds50ruv: {replicageneration} 52494108000000010000
nsds50ruv: {replica 1 ldap://dstet.example.com:22221} 52494135000000010000 524
 94135000000010000
nsds50ruv: {replica 2 ldap://dstet2.example.com:22222} 52494149000000020000 52
 494149000000020000
ou: foo
nsruvReplicaLastModified: {replica 1 ldap://dstet.example.com:22221} 00000000
nsruvReplicaLastModified: {replica 2 ldap://dstet2.example.com:22222} 00000000

[jrusnack@dhcp131-10 /]$ ldapsearch -h 192.168.122.88 -p 22221 -D"cn=directory manager" -w Secret123 -b "ou=foo,dc=example,dc=com" "(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectClass=nsTombstone))" -LLL -s one
dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,ou=foo,dc=example,dc=com
objectClass: top
objectClass: nsTombstone
objectClass: extensibleobject
nsds50ruv: {replicageneration} 52494108000000010000
nsds50ruv: {replica 1 ldap://dstet.example.com:22221} 52494135000000010000 524
 94135000000010000
nsds50ruv: {replica 2 ldap://dstet2.example.com:22222} 52494149000000020000 52
 494149000000020000
ou: foo
nsruvReplicaLastModified: {replica 1 ldap://dstet.example.com:22221} 00000000
nsruvReplicaLastModified: {replica 2 ldap://dstet2.example.com:22222} 00000000

Ami: once you successfully re-run steps from c#22, you can verify this bug.

Comment 29 Amita Sharma 2013-09-30 13:26:43 UTC
Thanks Jan, I have re-verified and can confirm steps are running just fine in #23.
Hence marking bug as VERIFIED.

Comment 30 Ján Rusnačko 2013-10-04 10:45:55 UTC
Automated comment#24 regression part in mmrepl/accept as bug947583 - trac ticket 964.

First part - steps in comment#22 - is yet to be automated. Corresponding trac ticket #777.

Comment 31 errata-xmlrpc 2013-11-21 21:06:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1653.html

Comment 32 Amita Sharma 2014-06-26 12:58:55 UTC
First part - steps in comment#22 -automated in /home/amsharma/RHEL7/testcases/DS/6.0/mmrepl/fourwaymmr as  [bug947583]


Note You need to log in before you can comment on or make changes to this bug.