Red Hat Bugzilla – Bug 837365
CLEANALLRUV must deal with offline replicas and older replicas
Last modified: 2013-02-21 04:16:24 EST
This bug is created as a clone of upstream ticket: https://fedorahosted.org/freeipa/ticket/2890 Support a more robust CLEANALLRUV task to handle offline replicas per 389-ds-base ticket https://fedorahosted.org/389/ticket/403 This will enhance or replace the CLEANRUV work done for ticket #2303
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Fixed upstream leveraging the updated CLEANALLRUV task in 389-ds-base 1.2.11. These are the commits from ticket 2303: master: c9c55a2845fd8471bc609a23f5a32d252f7df04c ipa-3-0: 40582a1f1e40607e7a1d1950dd07f638b156251e There are 3 new commands: ipa-replica-manage clean-ruv ipa-replica-manage abort-clean-ruv ipa-replica-manage list-clean-ruv An explicit clean-ruv shouldn't be necessary, we start a CLEANALLRUV task when deleting a master. A clean or abort CLEANALLRUV task will remain around forever, or until it completes, whichever comes first.
What steps can we use to test/verify this? Is this the basic scenario we're talking about here? 1. Setup 3 or more IPA servers 2. shutdown ipa on one 3. delete one replica from env 4. startup ipa on one from #2 Then how to confirm this worked? When are the new commands needed and how to test them?
There are quite a few scenarios to test. The simplest case is: - Install master and single replica - On master run: ipa-replica-manage list-ruv. There will be two, one for the master, one for the replica - On master run: ipa-replica-manage del replica - Once task is done, re-run ipa-replica-manage list-ruv. Only the master should be listed Basically you want to verify that the the RUV task is run when a master is deleted and confirm that its RUV entry is actually removed from all the servers. Some other possible scenarios when there are 3 masters: - shut down 2 of them, force deletion of one on the remaining master. Bring up the undeleted downed server to make sure it gets the RUV task run on it - shut down 1 of them, force deletion of it, verify that its RUV is cleaned up - delete a server, re-add it, verify that it gets a new RUV You may want to manually check the LDAP objects from the RUV docs to be sure that list-ruv is returning the right data too, and that things really are as they should be.
Verified. Version :: ipa-server-3.0.0-8.el6.x86_64 Manual Test Results :: #################################################################### # 1. Start with 4 servers in a square #################################################################### ENV: [rhel6-1]--[rhel6-2] | | [rhel6-4]--[rhel6-3] [root@rhel6-1 ~]# echo Secret123 | kinit admin Password for admin@TESTRELM.COM: [root@rhel6-1 ~]# ipa-replica-manage list rhel6-1.testrelm.com: master rhel6-2.testrelm.com: master rhel6-3.testrelm.com: master rhel6-4.testrelm.com: master [root@rhel6-1 ~]# ipa-replica-manage list rhel6-1.testrelm.com rhel6-2.testrelm.com: replica rhel6-4.testrelm.com: replica [root@rhel6-1 ~]# ipa-replica-manage list rhel6-2.testrelm.com rhel6-1.testrelm.com: replica rhel6-3.testrelm.com: replica [root@rhel6-1 ~]# ipa-replica-manage list rhel6-3.testrelm.com rhel6-2.testrelm.com: replica rhel6-4.testrelm.com: replica [root@rhel6-1 ~]# ipa-replica-manage list rhel6-4.testrelm.com rhel6-1.testrelm.com: replica rhel6-3.testrelm.com: replica #################################################################### # 2. Check current RUV info #################################################################### [root@rhel6-1 ~]# ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b694ee000000040000 nsds50ruv: {replica 3 ldap://rhel6-2.testrelm.com:389} 50b4efda000000030000 50 b7bd4f000000030000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7bd73000000050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7bd79000000060000 dc: testrelm nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b694ec nsruvReplicaLastModified: {replica 3 ldap://rhel6-2.testrelm.com:389} 50b7bd50 nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7bd75 nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7bd78 [root@rhel6-1 ~]# ipa-replica-manage list-ruv rhel6-1.testrelm.com:389: 4 rhel6-2.testrelm.com:389: 3 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 #################################################################### # 3. Delete server2 and make sure it's removed from remaining servers #################################################################### ########## # 3.1 Delete replica (rhel6-2) from master (rhel6-1): ########## [root@rhel6-1 ~]# ipa-replica-manage del rhel6-2.testrelm.com Deleting a master is irreversible. To reconnect to the remote master you will need to prepare a new replica file and re-install. Continue to delete? [no]: yes Deleting replication agreements between rhel6-2.testrelm.com and rhel6-1.testrelm.com, rhel6-3.testrelm.com ipa: INFO: Setting agreement cn=meTorhel6-1.testrelm.com,cn=replica,cn=dc\=testrelm\,dc\=com,cn=mapping tree,cn=config schedule to 2358-2359 0 to force synch ipa: INFO: Deleting schedule 2358-2359 0 from agreement cn=meTorhel6-1.testrelm.com,cn=replica,cn=dc\=testrelm\,dc\=com,cn=mapping tree,cn=config ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired successfully: Incremental update succeeded: start: 0: end: 0 Deleted replication agreement from 'rhel6-1.testrelm.com' to 'rhel6-2.testrelm.com' ipa: INFO: Setting agreement cn=meTorhel6-3.testrelm.com,cn=replica,cn=dc\=testrelm\,dc\=com,cn=mapping tree,cn=config schedule to 2358-2359 0 to force synch ipa: INFO: Deleting schedule 2358-2359 0 from agreement cn=meTorhel6-3.testrelm.com,cn=replica,cn=dc\=testrelm\,dc\=com,cn=mapping tree,cn=config ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired successfully: Incremental update succeeded: start: 0: end: 0 Deleted replication agreement from 'rhel6-3.testrelm.com' to 'rhel6-2.testrelm.com' Background task created to clean replication data. This may take a while. This may be safely interrupted with Ctrl+C ########## # 3.2 Check RUV list on master (rhel6-1): ########## [root@rhel6-1 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-1.testrelm.com:389: 4 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b7c2f7001300040000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7c2fc000a00050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7c2fb000000060000 dc: testrelm nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b7c2f5 nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7c2ff nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7c2fb ########## # 3.3 Checking on server deleted # This will show old data because it's removed from env ########## [root@rhel6-2 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-2.testrelm.com:389: 3 rhel6-3.testrelm.com:389: 5 rhel6-1.testrelm.com:389: 4 rhel6-4.testrelm.com:389: 6 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 3 ldap://rhel6-2.testrelm.com:389} 50b4efda000000030000 50 b7bd4f000000030000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7bd73000000050000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b694ee000000040000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7bd79000000060000 dc: testrelm nsruvReplicaLastModified: {replica 3 ldap://rhel6-2.testrelm.com:389} 50b7bd4c nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7bd71 nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b694eb nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7bd77 [root@rhel6-2 ~]# ipa-replica-manage list rhel6-1.testrelm.com: master rhel6-2.testrelm.com: master rhel6-3.testrelm.com: master rhel6-4.testrelm.com: master [root@rhel6-2 ~]# ipa-replica-manage list rhel6-1.testrelm.com rhel6-4.testrelm.com: replica [root@rhel6-2 ~]# ipa-replica-manage list rhel6-3.testrelm.com rhel6-4.testrelm.com: replica So, here we can see that the server is no longer in the environment for IPA. Dir is just stale because it was deleted. So have to uninstall/re-install. As expected. ########## # 3.4 Check RUV list On Replica (rhel6-3): ########## [root@rhel6-3 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 rhel6-1.testrelm.com:389: 4 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7bd73000000050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7bd79000000060000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b694ee000000040000 dc: testrelm nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7bd71 nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7bd7b nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b694eb ########## # 3.5 Check RUV list On Replica (rhel6-4): ########## [root@rhel6-4 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-4.testrelm.com:389: 6 rhel6-3.testrelm.com:389: 5 rhel6-1.testrelm.com:389: 4 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7bd79000000060000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7bd73000000050000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b694ee000000040000 dc: testrelm nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7bd77 nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7bd71 nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b694eb #################################################################### # 4. Re-Add server2 and make sure it's RUV id is different #################################################################### ########## # 4.1 Uninstall and re-install replica (rhel6-2) ########## uninstall ran cleanly. ipa-replica-prepare ran cleanly. [root@rhel6-2 ~]# ipa-replica-install -U --setup-ca --setup-dns --forwarder=$DNSFORWARD -w $ADMINPW -p $ADMINPW /var/lib/ipa/replica-info-rhel6-2.testrelm.com.gpg Install above ran cleanly. [root@rhel6-2 ~]# kinit admin Password for admin@TESTRELM.COM: [root@rhel6-2 ~]# ipa-replica-manage connect rhel6-3.testrelm.com ipa: INFO: Getting ldap service principals for conversion: (krbprincipalname=ldap/rhel6-2.testrelm.com@TESTRELM.COM) and (krbprincipalname=ldap/rhel6-3.testrelm.com@TESTRELM.COM) Connected 'rhel6-2.testrelm.com' to 'rhel6-3.testrelm.com' ######### # 4.2 Check RUV on self: ######### [root@rhel6-2 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-2.testrelm.com:389: 7 rhel6-1.testrelm.com:389: 4 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c68c000000070000 50 b7c71f000000070000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b7c710000500040000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7c762000000050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7c712000200060000 dc: testrelm nsruvReplicaLastModified: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c71d nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b7c70f nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7c760 nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7c714 ########## # 4.3 Check RUV on master (rhel6-1): ########## [root@rhel6-1 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-1.testrelm.com:389: 4 rhel6-2.testrelm.com:389: 7 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b7c710000500040000 nsds50ruv: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c68c000000070000 50 b7c71f000000070000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7c762000000050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7c712000200060000 dc: testrelm nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b7c70e nsruvReplicaLastModified: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c71d nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7c760 nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7c711 ########## # 4.4 Check RUV on replica (rhel6-3): ########## [root@rhel6-3 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 rhel6-1.testrelm.com:389: 4 rhel6-2.testrelm.com:389: 7 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7c762000000050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7c712000200060000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b7c710000500040000 nsds50ruv: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c68c000000070000 50 b7c71f000000070000 dc: testrelm nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7c760 nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7c711 nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b7c713 nsruvReplicaLastModified: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c71d ########## # 4.5 Check RUV on replica (rhel6-4): ########## [root@rhel6-4 ~]# ipa-replica-manage list-ruv ; ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' rhel6-4.testrelm.com:389: 6 rhel6-3.testrelm.com:389: 5 rhel6-1.testrelm.com:389: 4 rhel6-2.testrelm.com:389: 7 dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b7c712000200060000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b7c762000000050000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b7c710000500040000 nsds50ruv: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c68c000000070000 50 b7c71f000000070000 dc: testrelm nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b7c710 nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b7c760 nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b7c710 nsruvReplicaLastModified: {replica 7 ldap://rhel6-2.testrelm.com:389} 50b7c71d Ok, so we can see that the deleted replica gets a new RUV when it is re-added. #################################################################### # 5. Make sure CLEANRUV run after server brought back up # (Requires additional del when good server brought up) #################################################################### ########## # 5.1 Stop IPA on replica to delete (rhel6-2): ########## [root@rhel6-2 ~]# ipactl stop Stopping CA Service Stopping pki-ca: [ OK ] Stopping HTTP Service Stopping httpd: [ OK ] Stopping MEMCACHE Service Stopping ipa_memcached: [ OK ] Stopping DNS Service Stopping named: . [ OK ] Stopping KPASSWD Service Stopping Kerberos 5 Admin Server: [ OK ] Stopping KDC Service Stopping Kerberos 5 KDC: [ OK ] Stopping Directory Service Shutting down dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] ########## # 5.2 Stop IPA on replica to bring back up (rhel6-3): ########## [root@rhel6-3 ~]# ipactl stop Stopping CA Service Stopping pki-ca: [ OK ] Stopping HTTP Service Stopping httpd: [ OK ] Stopping MEMCACHE Service Stopping ipa_memcached: [ OK ] Stopping DNS Service Stopping named: . [ OK ] Stopping KPASSWD Service Stopping Kerberos 5 Admin Server: [ OK ] Stopping KDC Service Stopping Kerberos 5 KDC: [ OK ] Stopping Directory Service Shutting down dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] ########## # 5.3 Delete 1st replica brought down (rhel6-2): ########## [root@rhel6-1 ~]# ipa-replica-manage del rhel6-2.testrelm.com --force Connection to 'rhel6-2.testrelm.com' failed: Can't contact LDAP server Forcing removal of rhel6-2.testrelm.com Skipping calculation to determine if one or more masters would be orphaned. Deleting replication agreements between rhel6-2.testrelm.com and rhel6-1.testrelm.com, rhel6-3.testrelm.com, rhel6-4.testrelm.com Failed to get list of agreements from 'rhel6-2.testrelm.com': Can't contact LDAP server Forcing removal on 'rhel6-1.testrelm.com' Deleted replication agreement from 'rhel6-1.testrelm.com' to 'rhel6-2.testrelm.com' Failed to determine agreement type for 'rhel6-3.testrelm.com': Can't contact LDAP server Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-3.testrelm.com. 'rhel6-4.testrelm.com' has no replication agreement for 'rhel6-2.testrelm.com' Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-4.testrelm.com. Background task created to clean replication data. This may take a while. This may be safely interrupted with Ctrl+C This one looked like it waited until the CLEANRUV was finised. ########## # 5.4 While waiting for delete, run list-clean-ruv on master (rhel6-1): ########## root@rhel6-1 ~]# ipa-replica-manage list-clean-ruv CLEANALLRUV tasks RID 7: Replicas have not been cleaned yet, retrying in 40 seconds No abort CLEANALLRUV tasks running [root@rhel6-1 ~]# ipa-replica-manage list-clean-ruv CLEANALLRUV tasks RID 7: Replicas have not been cleaned yet, retrying in 80 seconds No abort CLEANALLRUV tasks running [root@rhel6-1 ~]# ipa-replica-manage list-clean-ruv CLEANALLRUV tasks RID 7: Replicas have not been cleaned yet, retrying in 160 seconds No abort CLEANALLRUV tasks running [root@rhel6-1 ~]# ipa-replica-manage list-clean-ruv CLEANALLRUV tasks RID 7: Replicas have not been cleaned yet, retrying in 320 seconds No abort CLEANALLRUV tasks running [root@rhel6-1 ~]# ipa-replica-manage list-ruv rhel6-1.testrelm.com:389: 4 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 ########## # 5.5 Bring second replica back up (rhel6-3) ########## root@rhel6-3 ~]# ipactl start Starting Directory Service Starting dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] Starting KDC Service Starting Kerberos 5 KDC: [ OK ] Starting KPASSWD Service Starting Kerberos 5 Admin Server: [ OK ] Starting DNS Service Starting named: [ OK ] Starting MEMCACHE Service Starting ipa_memcached: [ OK ] Starting HTTP Service Starting httpd: [ OK ] Starting CA Service Starting pki-ca: [ OK ] [root@rhel6-3 ~]# ########## # 5.6 Delete Replication agreement to deleted server # This step is necessary because server was down. # If you wait, you'll see the server brought up # trying to talk to the deleted one before it'll # finish the CLEANRUV. ########## [root@rhel6-3 ~]# ipa-replica-manage del rhel6-2.testrelm.com --force Connection to 'rhel6-2.testrelm.com' failed: Can't contact LDAP server Forcing removal of rhel6-2.testrelm.com Skipping calculation to determine if one or more masters would be orphaned. Deleting replication agreements between rhel6-2.testrelm.com and rhel6-1.testrelm.com, rhel6-3.testrelm.com, rhel6-4.testrelm.com 'rhel6-1.testrelm.com' has no replication agreement for 'rhel6-2.testrelm.com' Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-1.testrelm.com. Failed to get list of agreements from 'rhel6-2.testrelm.com': Can't contact LDAP server Forcing removal on 'rhel6-3.testrelm.com' Deleted replication agreement from 'rhel6-3.testrelm.com' to 'rhel6-2.testrelm.com' 'rhel6-4.testrelm.com' has no replication agreement for 'rhel6-2.testrelm.com' Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-4.testrelm.com. Background task created to clean replication data. This may take a while. This may be safely interrupted with Ctrl+C ########## # 5.7 Check that CLEANRUV run on second replica (rhel6-3): ########## [root@rhel6-3 ~]# ipa-replica-manage list-ruv rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 rhel6-1.testrelm.com:389: 4 Ok, so after learning that I needed to delete the agreement on the server brought back up, it seems to work as expected. #################################################################### # 6. Delete replica with all replicas down to test abort-clean-ruv #################################################################### ########## # 6.1 Check replica connections and shutdown IPA for 1st replica (rhel6-2) ########## [root@rhel6-2 ~]# ipa-replica-manage list $(hostname) rhel6-1.testrelm.com: replica rhel6-3.testrelm.com: replica [root@rhel6-2 ~]# ipactl stop Stopping CA Service Stopping pki-ca: [ OK ] Stopping HTTP Service Stopping httpd: [ OK ] Stopping MEMCACHE Service Stopping ipa_memcached: [ OK ] Stopping DNS Service Stopping named: . [ OK ] Stopping KPASSWD Service Stopping Kerberos 5 Admin Server: [ OK ] Stopping KDC Service Stopping Kerberos 5 KDC: [ OK ] Stopping Directory Service Shutting down dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] ########## # 6.2 Check replica connections and shutdown IPA for 2nd replica (rhel6-3) ########## [root@rhel6-3 ~]# ipa-replica-manage list $(hostname) rhel6-2.testrelm.com: replica rhel6-4.testrelm.com: replica [root@rhel6-3 ~]# ipactl stop Stopping CA Service Stopping pki-ca: [ OK ] Stopping HTTP Service Stopping httpd: [ OK ] Stopping MEMCACHE Service Stopping ipa_memcached: [ OK ] Stopping DNS Service Stopping named: . [ OK ] Stopping KPASSWD Service Stopping Kerberos 5 Admin Server: [ OK ] Stopping KDC Service Stopping Kerberos 5 KDC: [ OK ] Stopping Directory Service Shutting down dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] ########## # 6.3 Check replica connections and shutdown IPA for 3rd replica (rhel6-4) ########## [root@rhel6-4 ~]# ipa-replica-manage list $(hostname) rhel6-1.testrelm.com: replica rhel6-3.testrelm.com: replica [root@rhel6-4 ~]# ipactl stop Stopping CA Service Stopping pki-ca: [ OK ] Stopping HTTP Service Stopping httpd: [ OK ] Stopping MEMCACHE Service Stopping ipa_memcached: [ OK ] Stopping DNS Service Stopping named: . [ OK ] Stopping KPASSWD Service Stopping Kerberos 5 Admin Server: [ OK ] Stopping KDC Service Stopping Kerberos 5 KDC: [ OK ] Stopping Directory Service Shutting down dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] ########## # 6.4 Delete 1st replica (rhel6-2) on master (rhel6-1) ########## [root@rhel6-1 ~]# ipa-replica-manage del rhel6-2.testrelm.com --force Connection to 'rhel6-2.testrelm.com' failed: Can't contact LDAP server Forcing removal of rhel6-2.testrelm.com Skipping calculation to determine if one or more masters would be orphaned. Deleting replication agreements between rhel6-2.testrelm.com and rhel6-1.testrelm.com, rhel6-3.testrelm.com, rhel6-4.testrelm.com Failed to get list of agreements from 'rhel6-2.testrelm.com': Can't contact LDAP server Forcing removal on 'rhel6-1.testrelm.com' Deleted replication agreement from 'rhel6-1.testrelm.com' to 'rhel6-2.testrelm.com' Failed to determine agreement type for 'rhel6-3.testrelm.com': Can't contact LDAP server Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-3.testrelm.com. Failed to determine agreement type for 'rhel6-4.testrelm.com': Can't contact LDAP server Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-4.testrelm.com. Background task created to clean replication data. This may take a while. This may be safely interrupted with Ctrl+C You will note above that we cannot talk to any of our replicas. That's expected. It should also be noted that this doesn't end until the abort is run. ########## # 6.5 Check if CLEANRUV is running on master (rhel6-1) ########## [root@rhel6-1 ~]# ipa-replica-manage list-clean-ruv CLEANALLRUV tasks RID 9: Not all replicas online, retrying in 10 seconds... No abort CLEANALLRUV tasks running ########## # 6.6 Abort CLEANRUV task ########## [root@rhel6-1 ~]# ipa-replica-manage abort-clean-ruv 9 Aborting the clean Replication Update Vector task for rhel6-2.testrelm.com:389 Background task created. This may take a while. This may be safely interrupted with Ctrl+C Cleanup task stopped ########## # 6.7 List RUV to see that the RUV is still there ########## [root@rhel6-1 ~]# ipa-replica-manage list-ruv rhel6-1.testrelm.com:389: 4 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 rhel6-2.testrelm.com:389: 9 [root@rhel6-1 ~]# ldapsearch -xLLL -D "cn=Directory Manager" -w Secret123 -b dc=testrelm,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=testrelm,dc=com objectClass: top objectClass: nsTombstone objectClass: extensibleobject nsds50ruv: {replicageneration} 50b4efd5000000040000 nsds50ruv: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b4f049000000040000 50 b8f2d7000c00040000 nsds50ruv: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b4f31b000000050000 50 b8f1be000300050000 nsds50ruv: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b4f84c000000060000 50 b8ee5d000000060000 nsds50ruv: {replica 9 ldap://rhel6-2.testrelm.com:389} 50b8e3c9000000090000 50 b8f1bb000200090000 dc: testrelm nsruvReplicaLastModified: {replica 4 ldap://rhel6-1.testrelm.com:389} 50b8f2d5 nsruvReplicaLastModified: {replica 5 ldap://rhel6-3.testrelm.com:389} 50b8f1f1 nsruvReplicaLastModified: {replica 6 ldap://rhel6-4.testrelm.com:389} 50b8ee5b nsruvReplicaLastModified: {replica 9 ldap://rhel6-2.testrelm.com:389} 50b8f1f1 ########## # 6.8 Check log to confirm abort ran ########## From /var/log/dirsrv/slapd-TESTRELM-COM/errors [30/Nov/2012:12:53:37 -0500] NSMMReplicationPlugin - agmt_delete: begin [30/Nov/2012:12:53:57 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning rid (9)... [30/Nov/2012:12:53:57 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting to process all the updates from the deleted replica... [30/Nov/2012:12:53:57 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be online... [30/Nov/2012:12:53:57 -0500] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 107 (Transport endpoint is not connected) [30/Nov/2012:12:53:57 -0500] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server) [30/Nov/2012:12:53:57 -0500] NSMMReplicationPlugin - agmt="cn=meTorhel6-4.testrelm.com" (rhel6-4:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null)) [30/Nov/2012:12:53:57 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Not all replicas online, retrying in 10 seconds... [30/Nov/2012:12:54:07 -0500] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 107 (Transport endpoint is not connected) [30/Nov/2012:12:54:07 -0500] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server) [30/Nov/2012:12:54:07 -0500] NSMMReplicationPlugin - agmt="cn=meTorhel6-4.testrelm.com" (rhel6-4:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null)) [30/Nov/2012:12:54:07 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Not all replicas online, retrying in 20 seconds... [30/Nov/2012:12:54:25 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Task aborted for rid(9). [30/Nov/2012:12:54:25 -0500] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 107 (Transport endpoint is not connected) [30/Nov/2012:12:54:26 -0500] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server) [30/Nov/2012:12:54:26 -0500] NSMMReplicationPlugin - agmt="cn=meTorhel6-4.testrelm.com" (rhel6-4:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null)) [30/Nov/2012:12:54:26 -0500] NSMMReplicationPlugin - Abort CleanAllRUV Task: Failed to connect to replica(agmt="cn=meTorhel6-4.testrelm.com" (rhel6-4:389)). [30/Nov/2012:12:54:26 -0500] NSMMReplicationPlugin - Abort CleanAllRUV Task: Successfully aborted cleanAllRUV task for rid(9) You can see that the CleanAllRUV Task is Aborted there for the RUV (9) for the server that was deleted earlier (rhel6-2). #################################################################### # 7. Run clean-ruv to clean up after the previous abort. #################################################################### ########## # 7.1 Check env as expected after the previous abort-clean-ruv ########## [root@rhel6-1 ~]# ipa-replica-manage list-ruv rhel6-1.testrelm.com:389: 4 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 rhel6-2.testrelm.com:389: 9 [root@rhel6-1 ~]# ipa-replica-manage list rhel6-1.testrelm.com: master rhel6-3.testrelm.com: master rhel6-4.testrelm.com: master So, master deleted but, RUV still in place. Time to clean it up. ########## # 7.2 Start up 2nd replica (rhel6-3) ########## [root@rhel6-3 ~]# ipactl start Starting Directory Service Starting dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] Starting KDC Service Starting Kerberos 5 KDC: [ OK ] Starting KPASSWD Service Starting Kerberos 5 Admin Server: [ OK ] Starting DNS Service Starting named: [ OK ] Starting MEMCACHE Service Starting ipa_memcached: [ OK ] Starting HTTP Service Starting httpd: [ OK ] Starting CA Service Starting pki-ca: [ OK ] ########## # 7.3 Star up 3rd replica (rhel6-4) ########## [root@rhel6-4 ~]# ipactl start Starting Directory Service Starting dirsrv: PKI-IPA... [ OK ] TESTRELM-COM... [ OK ] Starting KDC Service Starting Kerberos 5 KDC: [ OK ] Starting KPASSWD Service Starting Kerberos 5 Admin Server: [ OK ] Starting DNS Service Starting named: [ OK ] Starting MEMCACHE Service Starting ipa_memcached: [ OK ] Starting HTTP Service Starting httpd: [ OK ] Starting CA Service Starting pki-ca: [ OK ] ########## # 7.4 Delete 2nd replica (rhel6-3) agreement to 1st replica (rhel6-2) ########## [root@rhel6-3 ~]# ipa-replica-manage del rhel6-2.testrelm.com --force Deleting replication agreements between rhel6-2.testrelm.com and rhel6-1.testrelm.com, rhel6-3.testrelm.com 'rhel6-1.testrelm.com' has no replication agreement for 'rhel6-2.testrelm.com' Unable to remove replication agreement for rhel6-2.testrelm.com from rhel6-1.testrelm.com. ipa: INFO: Setting agreement cn=meTorhel6-3.testrelm.com,cn=replica,cn=dc\=testrelm\,dc\=com,cn=mapping tree,cn=config schedule to 2358-2359 0 to force synch ipa: INFO: Deleting schedule 2358-2359 0 from agreement cn=meTorhel6-3.testrelm.com,cn=replica,cn=dc\=testrelm\,dc\=com,cn=mapping tree,cn=config ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired successfully: Incremental update succeeded: start: 0: end: 0 Deleted replication agreement from 'rhel6-3.testrelm.com' to 'rhel6-2.testrelm.com' Background task created to clean replication data. This may take a while. This may be safely interrupted with Ctrl+C ########## # 7.5 Run cleanAllRUV from master ########## [root@rhel6-1 ~]# ipa-replica-manage clean-ruv 9 Clean the Replication Update Vector for rhel6-2.testrelm.com:389 Cleaning the wrong replica ID will cause that server to no longer replicate so it may miss updates while the process is running. It would need to be re-initialized to maintain consistency. Be very careful. Continue to clean? [no]: yes CLEANALLRUV task for replica id 9 already exists. This may be safely interrupted with Ctrl+C ########## # 7.6 Check that RUV is gone from master ########## [root@rhel6-1 ~]# ipa-replica-manage list-ruv rhel6-1.testrelm.com:389: 4 rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 ########## # 7.7 Check RUV list on 2nd replica (rhel6-3) ########## [root@rhel6-3 ~]# ipa-replica-manage list-ruv rhel6-3.testrelm.com:389: 5 rhel6-4.testrelm.com:389: 6 rhel6-1.testrelm.com:389: 4 ########## # 7.8 Check RUV list on 3rd replica (rhel6-3) ########## [root@rhel6-4 ~]# ipa-replica-manage list-ruv rhel6-4.testrelm.com:389: 6 rhel6-3.testrelm.com:389: 5 rhel6-1.testrelm.com:389: 4 ########## # 7.9 Check log on master (rhel6-1) to be sure CleanAllRUV completed ########## From /var/log/dirsrv/slapd-TESTRELM-COM/errors: [30/Nov/2012:13:54:18 -0500] NSMMReplicationPlugin - cleanAllRUV_task: launching cleanAllRUV thread... [30/Nov/2012:13:54:18 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning rid (9)... [30/Nov/2012:13:54:18 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting to process all the updates from the deleted replica... [30/Nov/2012:13:54:18 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be online... [30/Nov/2012:13:54:18 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to receive all the deleted replica updates... [30/Nov/2012:13:54:19 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Sending cleanAllRUV task to all the replicas... [30/Nov/2012:13:54:19 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning local ruv's... [30/Nov/2012:13:54:19 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be cleaned... [30/Nov/2012:13:54:20 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Successfully cleaned rid(9). [30/Nov/2012:13:58:00 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Sending cleanAllRUV task to all the replicas... [30/Nov/2012:13:58:00 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Cleaning local ruv's... [30/Nov/2012:13:58:00 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Waiting for all the replicas to be cleaned... [30/Nov/2012:13:58:01 -0500] NSMMReplicationPlugin - CleanAllRUV Task: Successfully cleaned rid(9). So, looks like clean-ruv worked here after everything was up as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0528.html