Bug 1013738

Summary: CLEANALLRUV doesnt run across all replicas
Product: Red Hat Enterprise Linux 7 Reporter: Nathan Kinder <nkinder>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED CURRENTRELEASE QA Contact: Sankar Ramalingam <sramling>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: jgalipea, mreynolds
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.1.6-5.el7 Doc Type: Bug Fix
Doc Text:
Cause: Running the CLEANALLRUV task in a replication environment where one of the replicas does not support the CLEANALLRUV task. Consequence: The task never completes Fix: Do not prevent the task from completing if it runs across a replica that does not support the CLEANALLRUV task. Result: The CLEANALLRUV task completes after it cleans all the replicas that do support the task.
Story Points: ---
Clone Of: 1013735 Environment:
Last Closed: 2014-06-13 10:21:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1013735    
Bug Blocks:    

Description Nathan Kinder 2013-09-30 16:52:12 UTC
+++ This bug was initially created as a clone of Bug #1013735 +++

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47509

While running CLEANALLRUV from the masters , it fails to execute across some replicas with the error
CleanAllRUV Task: Replica cn=example-lx9078,cn=replica,cn=o\3Dexample.com,cn=mapping tree,cn=config does not support the CLEANALLRUV task.  Sending replica CLEANRUV task...
[10/Sep/2013:12:19:37 +0000] NSMMReplicationPlugin - CleanAllRUV Task: Failed to add CLEANRUV task replica (agmt="cn=example-lx9078" (example-lx9078:636)).  You will need to manually run the CLEANRUV task on this replica (example-lx9078.examplea.com) error (32)
[10/Sep/2013:12:19:37 +0000] NSMMReplicationPlugin - CleanAllRUV Task: Failed to send task to replica (agmt="cn=example-lx9078" (example-lx9078:636))
[10/Sep/2013:12:19:37 +0000] NSMMReplicationPlugin - CleanAllRUV Task: Not all replicas have received the cleanallruv extended op,
The error continues even after running the ABORT CLEANALLRUV task and manually running the CLEANRUV task on the replica.

Comment 1 Nathan Kinder 2013-10-01 02:40:20 UTC
To reproduce/verify:

----------------------------------------
- Setup replication with an older 389-ds-base instance that doesn't support CLEANALLRUV and a two newer instances that do support CLEANALLRUV.  Use a 3 master full-mesh topology.

- Run remove-ds.pl to remove one of the newer instances.

- Remove any replication agreements that point to the deleted instance.

- Run the CLEANALLRUV task on the one remaining newer master to remove the RUV for the removed master.
----------------------------------------

The bug is that the task never completes and you can't abort the task.  With the fix, the task should not hang.

Comment 3 Sankar Ramalingam 2014-02-20 15:34:28 UTC
To verify bugzilla, I followed the steps...
1. Setup RHEL7 machine and create 2 masters/2 consumers.
2. Setup RHEL63 machine with older version of 389-ds-base-1.2.10.2-15
3. Setup a new master in RHEL63 machine to talk to M1 and M2 on RHEL7
4. M1(Replica Id - 1231) and M2(Replica Id - 1232) in RHEL7. M3(Replica Id - 1291) is in RHEL63.
5. Verified replication works fine. All entries synced from each master.
6. Removed replication agreements for M1 from M2 and M3.
7. Removed M1 from RHEL7 machine.
8. Ran cleanallruv task from M2. cat cleanallruv.ldif
dn: cn=1013735,cn=cleanallruv,cn=tasks,cn=config
cn: bug1013735
objectclass: extensibleObject
replica-base-dn: dc=passsync,dc=com
replica-id: 1231

9. cleanallruv task completed. Some other issues reported from the error logs which is not relevant to the cleanallruv task running for Replica Id 1231.

[20/Feb/2014:10:08:04 -0500] NSMMReplicationPlugin - agmt_delete: begin
[20/Feb/2014:10:09:44 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Initiating CleanAllRUV Task...
[20/Feb/2014:10:09:44 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Retrieving maxcsn...
[20/Feb/2014:10:09:44 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Found maxcsn (53060f1c000704cf0000)
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning rid (1231)...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting to process all the updates from the deleted replica...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be online...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to receive all the deleted replica updates...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Sending cleanAllRUV task to all the replicas...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Replica cn=1289_to_1616_on_hp-dl380pgen8-02-vm-4.lab.bos.redhat.com,cn=replica,cn=dc\3Dpasssync\2Cdc\3Dcom,cn=mapping tree,cn=config does not support the CLEANALLRUV task.  Sending replica CLEANRUV task...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Failed to add CLEANRUV task (cn=replica,cn=dc\3Dpasssync\2Cdc\3Dcom,cn=mapping tree,cn=config) to replica (agmt="cn=1289_to_1616_on_hp-dl380pgen8-02-vm-4.lab.bos.redhat.com" (hp-dl380pgen8-02-vm-4:1616)).  You will need to manually run the CLEANRUV task on this replica (hp-dl380pgen8-02-vm-4.lab.bos.redhat.com) error (50)
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning local ruv's...
[20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be cleaned...
[20/Feb/2014:10:09:46 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to finish cleaning...
[20/Feb/2014:10:09:46 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Successfully cleaned rid(1231).




Also, nsds50ruv has not been cleaned. I see these entries in all other replication agreements. Is this something which is expected to be cleaned from replication agreements?

nsds50ruv: {replica 1231 ldap://ibm-hs23-01.rhts.eng.bos.redhat.com:3189} 5306
 0eab000004cf0000 53060f1c000704cf0000
nsruvReplicaLastModified: {replica 1231 ldap://ibm-hs23-01.rhts.eng.bos.redhat
 .com:3189} 00000000

Comment 4 Sankar Ramalingam 2014-02-20 15:37:47 UTC
These error messages observed from the RHEL6.3 master. But, the replication doesn't have any issues. I could sync entries from M3 to M1 and vice versa.


[20/Feb/2014:10:33:25 -0500] agmt="cn=1189_to_1626_on_ibm-hs23-01.rhts.eng.bos.redhat.com" (ibm-hs23-01:1626) - Can't locate CSN 53060eab000004cf0000 in the changelog (DB rc=-30988). The consumer may need to be reinitialized.
[20/Feb/2014:10:33:27 -0500] agmt="cn=1189_to_1626_on_ibm-hs23-01.rhts.eng.bos.redhat.com" (ibm-hs23-01:1626) - Can't locate CSN 53060eab000004cf0000 in the changelog (DB rc=-30988). The consumer may need to be reinitialized.
[20/Feb/2014:10:33:36 -0500] agmt="cn=1189_to_1626_on_ibm-hs23-01.rhts.eng.bos.redhat.com" (ibm-hs23-01:1626) - Can't locate CSN 53060eab000004cf0000 in the changelog (DB rc=-30988). The consumer may need to be reinitialized.
[20/Feb/2014:10:33:38 -0500] agmt="cn=1189_to_1626_on_ibm-hs23-01.rhts.eng.bos.redhat.com" (ibm-hs23-01:1626) - Can't locate CSN 53060eab000004cf0000 in the changelog (DB rc=-30988). The consumer may need to be reinitialized.

Comment 5 mreynolds 2014-02-20 16:16:06 UTC
(In reply to Sankar Ramalingam from comment #3)
> To verify bugzilla, I followed the steps...
> 1. Setup RHEL7 machine and create 2 masters/2 consumers.
> 2. Setup RHEL63 machine with older version of 389-ds-base-1.2.10.2-15
> 3. Setup a new master in RHEL63 machine to talk to M1 and M2 on RHEL7
> 4. M1(Replica Id - 1231) and M2(Replica Id - 1232) in RHEL7. M3(Replica Id -
> 1291) is in RHEL63.
> 5. Verified replication works fine. All entries synced from each master.
> 6. Removed replication agreements for M1 from M2 and M3.
> 7. Removed M1 from RHEL7 machine.
> 8. Ran cleanallruv task from M2. cat cleanallruv.ldif
> dn: cn=1013735,cn=cleanallruv,cn=tasks,cn=config
> cn: bug1013735
> objectclass: extensibleObject
> replica-base-dn: dc=passsync,dc=com
> replica-id: 1231
> 
> 9. cleanallruv task completed. Some other issues reported from the error
> logs which is not relevant to the cleanallruv task running for Replica Id
> 1231.
> 
> [20/Feb/2014:10:08:04 -0500] NSMMReplicationPlugin - agmt_delete: begin
> [20/Feb/2014:10:09:44 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Initiating CleanAllRUV Task...
> [20/Feb/2014:10:09:44 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Retrieving maxcsn...
> [20/Feb/2014:10:09:44 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Found
> maxcsn (53060f1c000704cf0000)
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Cleaning rid (1231)...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Waiting to process all the updates from the deleted replica...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Waiting for all the replicas to be online...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Waiting for all the replicas to receive all the deleted replica updates...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Sending cleanAllRUV task to all the replicas...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Replica
> cn=1289_to_1616_on_hp-dl380pgen8-02-vm-4.lab.bos.redhat.com,cn=replica,
> cn=dc\3Dpasssync\2Cdc\3Dcom,cn=mapping tree,cn=config does not support the
> CLEANALLRUV task.  Sending replica CLEANRUV task...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Failed to add CLEANRUV task
> (cn=replica,cn=dc\3Dpasssync\2Cdc\3Dcom,cn=mapping tree,cn=config) to
> replica (agmt="cn=1289_to_1616_on_hp-dl380pgen8-02-vm-4.lab.bos.redhat.com"
> (hp-dl380pgen8-02-vm-4:1616)).  You will need to manually run the CLEANRUV
> task on this replica (hp-dl380pgen8-02-vm-4.lab.bos.redhat.com) error (50)
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Cleaning local ruv's...
> [20/Feb/2014:10:09:45 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Waiting for all the replicas to be cleaned...
> [20/Feb/2014:10:09:46 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Waiting for all the replicas to finish cleaning...
> [20/Feb/2014:10:09:46 -0500] NSMMReplicationPlugin - CleanAllRUV Task:
> Successfully cleaned rid(1231).
> 
> 
> 
> 
> Also, nsds50ruv has not been cleaned. I see these entries in all other
> replication agreements. Is this something which is expected to be cleaned
> from replication agreements?
> 
> nsds50ruv: {replica 1231 ldap://ibm-hs23-01.rhts.eng.bos.redhat.com:3189}
> 5306
>  0eab000004cf0000 53060f1c000704cf0000
> nsruvReplicaLastModified: {replica 1231
> ldap://ibm-hs23-01.rhts.eng.bos.redhat
>  .com:3189} 00000000

This all looks correct.  The fix for this bug is that it does not get stuck when it encounters a replica that does not support CLEANALLRUV.  As we can see, the task moves on and finishes its processing.

As for the RUV not being cleaned...  As stated in the logging, you need to manually run CLEANRUV on the replica that does not support CLEANALLRUV.  So although the ruv was successfully cleaned, the older replica(which was not cleaned), polluted the ruv again.  This is expected.

Anyway, you have successfully verified this bug fix.

Comment 6 Sankar Ramalingam 2014-02-21 02:58:38 UTC
As per your above comments, marking the bug as Verified.

Comment 7 Ludek Smid 2014-06-13 10:21:15 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.