Bug 1013735

Summary: CLEANALLRUV doesnt run across all replicas
Product: Red Hat Enterprise Linux 6 Reporter: Nathan Kinder <nkinder>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Sankar Ramalingam <sramling>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: jgalipea, mreynolds, nhosoi, nkinder, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.15-27.el6 Doc Type: Bug Fix
Doc Text:
Cause: A server in the replication environment that does not support the CLEANALLRUV task. Consequence: The task never finishes Fix: Ignore replicas that do not support the task. Result: CLEANALLRUV task completes when all the replicas that support the task have been cleaned.
Story Points: ---
Clone Of:
: 1013738 (view as bug list) Environment:
Last Closed: 2013-11-21 21:12:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1013738    

Description Nathan Kinder 2013-09-30 16:45:58 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47509

While running CLEANALLRUV from the masters , it fails to execute across some replicas with the error
CleanAllRUV Task: Replica cn=example-lx9078,cn=replica,cn=o\3Dexample.com,cn=mapping tree,cn=config does not support the CLEANALLRUV task.  Sending replica CLEANRUV task...
[10/Sep/2013:12:19:37 +0000] NSMMReplicationPlugin - CleanAllRUV Task: Failed to add CLEANRUV task replica (agmt="cn=example-lx9078" (example-lx9078:636)).  You will need to manually run the CLEANRUV task on this replica (example-lx9078.examplea.com) error (32)
[10/Sep/2013:12:19:37 +0000] NSMMReplicationPlugin - CleanAllRUV Task: Failed to send task to replica (agmt="cn=example-lx9078" (example-lx9078:636))
[10/Sep/2013:12:19:37 +0000] NSMMReplicationPlugin - CleanAllRUV Task: Not all replicas have received the cleanallruv extended op,
The error continues even after running the ABORT CLEANALLRUV task and manually running the CLEANRUV task on the replica.

Comment 1 Nathan Kinder 2013-10-01 02:37:21 UTC
To reproduce/verify:

----------------------------------------
- Setup replication with an older 389-ds-base instance that doesn't support CLEANALLRUV and a two newer instances that do support CLEANALLRUV.  Use a 3 master full-mesh topology.

- Run remove-ds.pl to remove one of the newer instances.

- Remove any replication agreements that point to the deleted instance.

- Run the CLEANALLRUV task on the one remaining newer master to remove the RUV for the removed master.
----------------------------------------

The bug is that the task never completes and you can't abort the task.  With the fix, the task should not hang.

Comment 3 Sankar Ramalingam 2013-10-17 15:35:41 UTC
This cannot be automated in TET since it requires multiple machines to verify. Hence, removing the qe_test_coverage+ flag.

Comment 4 Sankar Ramalingam 2013-10-24 15:30:25 UTC
1). As per the comment #1, I configured replication between 389-ds-base-1.2.11.15-29(Two Masters) and 389-ds-base-1.2.10.2-15(One Master). Then removed M2 and initiated a cleanallruv task on M1 for replica Id-1252(M2)

cat cleanruv.ldif 
dn: cn=1013735,cn=cleanallruv,cn=tasks,cn=config
cn: bug1013735
objectclass: extensibleObject
replica-base-dn: dc=passsync,dc=com
replica-id: 1252


2). ldapmodify -x -p 1189 -h localhost -D "cn=Directory Manager" -w Secret123 -avf /export/cleanruv.ldif 
ldap_initialize( ldap://localhost:1189 )
add cn:
	bug1013735
add objectclass:
	extensibleObject
add replica-base-dn:
	dc=passsync,dc=com
add replica-id:
	1252
adding new entry "cn=1013735,cn=cleanallruv,cn=tasks,cn=config"
modify complete

3). Though, the job completes immediately, the retry keeps checking whether the replica is on-line.

Also, I could re-run the same task repeatedly. Is this a problem?

Comment 5 Sankar Ramalingam 2013-10-24 15:55:32 UTC
(In reply to Sankar Ramalingam from comment #4)
> 1). As per the comment #1, I configured replication between
> 389-ds-base-1.2.11.15-29(Two Masters) and 389-ds-base-1.2.10.2-15(One
> Master). Then removed M2 and initiated a cleanallruv task on M1 for replica
> Id-1252(M2)
> 
> cat cleanruv.ldif 
> dn: cn=1013735,cn=cleanallruv,cn=tasks,cn=config
> cn: bug1013735
> objectclass: extensibleObject
> replica-base-dn: dc=passsync,dc=com
> replica-id: 1252
> 
> 
> 2). ldapmodify -x -p 1189 -h localhost -D "cn=Directory Manager" -w
> Secret123 -avf /export/cleanruv.ldif 
> ldap_initialize( ldap://localhost:1189 )
> add cn:
> 	bug1013735
> add objectclass:
> 	extensibleObject
> add replica-base-dn:
> 	dc=passsync,dc=com
> add replica-id:
> 	1252
> adding new entry "cn=1013735,cn=cleanallruv,cn=tasks,cn=config"
> modify complete
> 
> 3). Though, the job completes immediately, the retry keeps checking whether
> the replica is on-line.
Deleting the replication agreement resolves the problem. As per the design doc, the replication agreement should be removed before running the cleanallruv task.
> 
> Also, I could re-run the same task repeatedly. Is this a problem?

Comment 6 mreynolds 2013-10-24 15:56:41 UTC
Revised steps to verify the fix:

[1]  Create two instances(1.2.11.x):  replica A & B
[2]  Create a third instance(1.2.10) on a different host:  replica C
[3]  Setup replication between all three
[4]  Make some updates on each replica to make sure everything is working
[5]  Remove all the agreements that point to replica B from replicas A & C
[6]  Remove replica B
[7]  Run the cleanallruv task against replica A

The task should complete.  Its ok if replica C was not cleaned, this fix was only to make sure the task does not endlessly loop.

Comment 7 Sankar Ramalingam 2013-10-24 15:58:45 UTC
As per comment #5 and #6, I am marking the bug as Verified.

Comment 8 errata-xmlrpc 2013-11-21 21:12:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1653.html