590826 – Reloading database from ldif causes changelog to emit "data no longer matches" errors

Bug 590826 - Reloading database from ldif causes changelog to emit "data no longer matches" errors

Summary: Reloading database from ldif causes changelog to emit "data no longer matches...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	389
Classification:	Retired
Component:	Replication - General
Sub Component:
Version:	1.2.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Rich Megginson
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	690319 389_1.2.10 781500
TreeView+	depends on / blocked

Reported:	2010-05-10 18:51 UTC by John Bryson
Modified:	2015-12-10 18:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Clones:	736712 737174 781500 (view as bug list)
Environment:
Last Closed:	2015-12-10 18:37:44 UTC
Embargoed:

Attachments	(Terms of Use)
output from cl-dump (20.78 KB, text/plain) 2011-09-07 20:56 UTC, Robert Viduya	no flags	Details
output from ldapsearch objectclass=nstombstone (4.48 KB, text/plain) 2011-09-07 20:57 UTC, Robert Viduya	no flags	Details
0001-Bug-590826-Reloading-database-from-ldif-causes-chang.patch (15.09 KB, patch) 2011-09-09 16:19 UTC, Rich Megginson	nhosoi: review+	Details \| Diff
0001-Bug-590826-Reloading-database-from-ldif-causes-chang.patch (15.45 KB, patch) 2011-09-09 19:00 UTC, Rich Megginson	nkinder: review+	Details \| Diff
Show Obsolete (1) View All

Description John Bryson 2010-05-10 18:51:08 UTC

Description of problem:

Had issues and reinstalled 2 new masters gtedm1.iam gtedm2.iam. Now if we restart either new master, we get this error in the error log.

Reinitializing doesnt matter. If we reinitialize from a presumably clean master (gertrude.iam), we still get this result. 

 
[10/May/2010:14:47:01 -0400] - slapd stopped.
        389-Directory/1.2.2 B2009.237.2054
        gtedm3.iam.gatech.edu:636 (/etc/dirsrv/slapd-gtedm3)

[10/May/2010:14:47:12 -0400] - 389-Directory/1.2.2 B2009.237.2054 starting up
[10/May/2010:14:47:13 -0400] - cache autosizing. found 2059480k physical memory
[10/May/2010:14:47:13 -0400] - cache autosizing: db cache: 617844k, each entry cache (4 total): 123568k
[10/May/2010:14:47:14 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[10/May/2010:14:47:14 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=gted,dc=gatech,dc=edu was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[10/May/2010:14:47:14 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[10/May/2010:14:47:15 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[10/May/2010:14:47:15 -0400] - Listening on All Interfaces port 636 for LDAPS requests
[root@gtedm3 slapd-gtedm3]#  


Version-Release number of selected component (if applicable):
        389-Directory/1.2.2 B2009.237.2054


How reproducible:

  Any time we do dirsrv stop then start. we see it on gtedm1 and gtedm2

Steps to Reproduce:
1. stop and start one of the new masters (reinitializig them doesnt help)
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 7 John Bryson 2011-09-01 17:53:49 UTC

We got bitten by this bug again. 

We are in the process of upgrading to 1.2.8.3 and just finished doing one of our hubs.  (briggs.iam.gatech.edu) It's showing the following in the logs whenever it starts up:

[01/Sep/2011:13:13:49 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e5ff459000000320000) > changelog (4e5ff459000000320000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[01/Sep/2011:13:13:53 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e5ff469000100330000) > changelog (4e5ff469000100330000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[01/Sep/2011:13:13:57 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e5ff460000000340000) > changelog (4e5ff460000000340000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.

Recall that our MMR replication setup is 3master --> 2replication hubs --> 12 replication slaves

Comment 8 Rich Megginson 2011-09-01 18:45:27 UTC

hmm - the upgrade may have made changes to the database that invalidated the changelog - what's in the errors log on the hub from around the time of the upgrade?

Comment 9 John Bryson 2011-09-01 19:00:46 UTC

briggs.iam.gatech.edu: pwd
/var/log/dirsrv/slapd-briggs
briggs.iam.gatech.edu: ls -l
total 22284
-rw-r--r-- 1 nobody nobody 20402660 Sep  1 15:00 access
-rw-r--r-- 1 nobody nobody  2298370 Aug 31 17:17 access.20110831-143520
-rw-r--r-- 1 nobody nobody      164 Sep  1 10:09 access.rotationinfo
-rw-r--r-- 1 nobody nobody        0 Aug 31 14:35 audit
-rw-r--r-- 1 nobody nobody       63 Aug 31 14:35 audit.rotationinfo
-rw-r--r-- 1 nobody nobody        0 Sep  1 04:02 dropped
-rw-r--r-- 1 nobody nobody     5885 Sep  1 13:14 errors
-rw-r--r-- 1 nobody nobody    21716 Aug 31 18:14 errors.20110831-143519
-rw-r--r-- 1 nobody nobody      162 Sep  1 10:04 errors.rotationinfo
briggs.iam.gatech.edu: 

Well this is the error file showing imports to build the 3 dbs (gted, accounts, people)....

briggs.iam.gatech.edu: 
briggs.iam.gatech.edu: cat errors.20110831-143519
	389-Directory/1.2.8.3 B2011.122.1636
	briggs.iam.gatech.edu:389 (/etc/dirsrv/slapd-briggs)

[31/Aug/2011:14:35:20 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[31/Aug/2011:14:35:20 -0400] - check_and_set_import_cache: pagesize: 4096, pages: 6171388, procpages: 46718
[31/Aug/2011:14:35:20 -0400] - Import allocates 9874220KB import cache.
[31/Aug/2011:14:35:21 -0400] - Setting ncache to: 3 to keep each chunk below 4Gbytes
[31/Aug/2011:14:35:21 -0400] - import userRoot: Beginning import job...
[31/Aug/2011:14:35:21 -0400] - import userRoot: Index buffering enabled with bucket size 100
[31/Aug/2011:14:35:21 -0400] - import userRoot: Processing file "/tmp/ldife3ntJ0.ldif"
[31/Aug/2011:14:35:21 -0400] - import userRoot: Finished scanning file "/tmp/ldife3ntJ0.ldif" (9 entries)
[31/Aug/2011:14:35:22 -0400] - import userRoot: Workers finished; cleaning up...
[31/Aug/2011:14:35:22 -0400] - import userRoot: Workers cleaned up.
[31/Aug/2011:14:35:22 -0400] - import userRoot: Cleaning up producer thread...
[31/Aug/2011:14:35:22 -0400] - import userRoot: Indexing complete.  Post-processing...
[31/Aug/2011:14:35:22 -0400] - import userRoot: Flushing caches...
[31/Aug/2011:14:35:23 -0400] - import userRoot: Closing files...
[31/Aug/2011:14:35:23 -0400] - All database threads now stopped
[31/Aug/2011:14:35:23 -0400] - import userRoot: Import complete.  Processed 9 entries in 2 seconds. (4.50 entries/sec)
[31/Aug/2011:14:35:23 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[31/Aug/2011:14:35:23 -0400] - I'm resizing my cache now...cache was 1521266688 and is now 8000000
[31/Aug/2011:14:35:23 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[31/Aug/2011:14:37:25 -0400] - slapd shutting down - signaling operation threads
[31/Aug/2011:14:37:26 -0400] - slapd shutting down - closing down internal subsystems and plugins
[31/Aug/2011:14:37:26 -0400] - Waiting for 4 database threads to stop
[31/Aug/2011:14:37:27 -0400] - All database threads now stopped
[31/Aug/2011:14:37:27 -0400] - slapd stopped.
[31/Aug/2011:14:38:42 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[31/Aug/2011:14:38:43 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[31/Aug/2011:14:40:07 -0400] - slapd shutting down - signaling operation threads
[31/Aug/2011:14:40:08 -0400] - slapd shutting down - waiting for 20 threads to terminate
[31/Aug/2011:14:40:08 -0400] - slapd shutting down - closing down internal subsystems and plugins
[31/Aug/2011:14:40:09 -0400] - Waiting for 4 database threads to stop
[31/Aug/2011:14:40:09 -0400] - All database threads now stopped
[31/Aug/2011:14:40:10 -0400] - slapd stopped.
[31/Aug/2011:14:40:18 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[31/Aug/2011:14:40:18 -0400] attrcrypt - No symmetric key found for cipher AES in backend userRoot, attempting to create one...
[31/Aug/2011:14:40:18 -0400] attrcrypt - Key for cipher AES successfully generated and stored
[31/Aug/2011:14:40:19 -0400] attrcrypt - No symmetric key found for cipher 3DES in backend userRoot, attempting to create one...
[31/Aug/2011:14:40:19 -0400] attrcrypt - Key for cipher 3DES successfully generated and stored
[31/Aug/2011:14:40:19 -0400] attrcrypt - No symmetric key found for cipher AES in backend NetscapeRoot, attempting to create one...
[31/Aug/2011:14:40:19 -0400] attrcrypt - Key for cipher AES successfully generated and stored
[31/Aug/2011:14:40:20 -0400] attrcrypt - No symmetric key found for cipher 3DES in backend NetscapeRoot, attempting to create one...
[31/Aug/2011:14:40:20 -0400] attrcrypt - Key for cipher 3DES successfully generated and stored
[31/Aug/2011:14:40:20 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[31/Aug/2011:14:40:21 -0400] - Listening on All Interfaces port 636 for LDAPS requests
[31/Aug/2011:14:40:36 -0400] - slapd shutting down - signaling operation threads
[31/Aug/2011:14:40:36 -0400] - slapd shutting down - waiting for 29 threads to terminate
[31/Aug/2011:14:40:36 -0400] - slapd shutting down - closing down internal subsystems and plugins
[31/Aug/2011:14:40:36 -0400] - Waiting for 4 database threads to stop
[31/Aug/2011:14:40:37 -0400] - All database threads now stopped
[31/Aug/2011:14:40:38 -0400] - slapd stopped.
[31/Aug/2011:14:57:20 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[31/Aug/2011:14:57:20 -0400] - cache autosizing. found 24685552k physical memory
[31/Aug/2011:14:57:20 -0400] - cache autosizing: db cache: 7405664k, each entry cache (4 total): 1481132k
[31/Aug/2011:14:57:20 -0400] - I'm resizing my cache now...cache was 8000000 and is now 3288432640
[31/Aug/2011:14:57:40 -0400] attrcrypt - No symmetric key found for cipher AES in backend people, attempting to create one...
[31/Aug/2011:15:03:14 -0400] attrcrypt - Key for cipher AES successfully generated and stored
[31/Aug/2011:15:03:15 -0400] attrcrypt - No symmetric key found for cipher 3DES in backend people, attempting to create one...
[31/Aug/2011:15:03:15 -0400] attrcrypt - Key for cipher 3DES successfully generated and stored
[31/Aug/2011:15:03:19 -0400] attrcrypt - No symmetric key found for cipher AES in backend gted, attempting to create one...
[31/Aug/2011:15:03:19 -0400] attrcrypt - Key for cipher AES successfully generated and stored
[31/Aug/2011:15:03:19 -0400] attrcrypt - No symmetric key found for cipher 3DES in backend gted, attempting to create one...
[31/Aug/2011:15:03:19 -0400] attrcrypt - Key for cipher 3DES successfully generated and stored
[31/Aug/2011:15:03:21 -0400] attrcrypt - No symmetric key found for cipher AES in backend accounts, attempting to create one...
[31/Aug/2011:15:03:22 -0400] attrcrypt - Key for cipher AES successfully generated and stored
[31/Aug/2011:15:03:22 -0400] attrcrypt - No symmetric key found for cipher 3DES in backend accounts, attempting to create one...
[31/Aug/2011:15:03:22 -0400] attrcrypt - Key for cipher 3DES successfully generated and stored
[31/Aug/2011:15:03:25 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[31/Aug/2011:15:03:25 -0400] - Listening on All Interfaces port 636 for LDAPS requests
[31/Aug/2011:15:03:33 -0400] - slapd shutting down - signaling operation threads
[31/Aug/2011:15:03:33 -0400] - slapd shutting down - waiting for 29 threads to terminate
[31/Aug/2011:15:03:33 -0400] - slapd shutting down - closing down internal subsystems and plugins
[31/Aug/2011:15:03:33 -0400] - Waiting for 4 database threads to stop
[31/Aug/2011:15:03:35 -0400] - All database threads now stopped
[31/Aug/2011:15:03:35 -0400] - slapd stopped.
[31/Aug/2011:15:04:53 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[31/Aug/2011:15:04:54 -0400] - cache autosizing. found 24685552k physical memory
[31/Aug/2011:15:04:55 -0400] - cache autosizing: db cache: 7405664k, each entry cache (4 total): 1481132k
[31/Aug/2011:15:04:55 -0400] - I'm resizing my cache now...cache was 3288432640 and is now 3288432640
[31/Aug/2011:15:05:07 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[31/Aug/2011:15:11:17 -0400] - Listening on All Interfaces port 636 for LDAPS requests
[31/Aug/2011:17:17:02 -0400] - slapd shutting down - signaling operation threads
[31/Aug/2011:17:17:02 -0400] - slapd shutting down - closing down internal subsystems and plugins
[31/Aug/2011:17:17:02 -0400] - Waiting for 4 database threads to stop
[31/Aug/2011:17:17:04 -0400] - All database threads now stopped
[31/Aug/2011:17:17:04 -0400] - slapd stopped.
[31/Aug/2011:17:30:30 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[31/Aug/2011:17:30:35 -0400] - check_and_set_import_cache: pagesize: 4096, pages: 6171388, procpages: 93674
[31/Aug/2011:17:30:38 -0400] - Import allocates 9874220KB import cache.
[31/Aug/2011:17:30:40 -0400] - Setting ncache to: 3 to keep each chunk below 4Gbytes
[31/Aug/2011:17:30:48 -0400] - import gted: Beginning import job...
[31/Aug/2011:17:30:49 -0400] - import gted: Index buffering enabled with bucket size 100
[31/Aug/2011:17:30:52 -0400] - import gted: Processing file "/var/tmp/g.ldif"
[31/Aug/2011:17:31:19 -0400] - import gted: Processed 17042 entries -- average rate 631.2/sec, recent rate 631.1/sec, hit ratio 0%
[31/Aug/2011:17:31:43 -0400] - import gted: Processed 47906 entries -- average rate 939.3/sec, recent rate 939.3/sec, hit ratio 100%
[31/Aug/2011:17:32:04 -0400] - import gted: Processed 73513 entries -- average rate 1021.0/sec, recent rate 1254.9/sec, hit ratio 100%
[31/Aug/2011:17:32:25 -0400] - import gted: Processed 99605 entries -- average rate 1071.0/sec, recent rate 1230.9/sec, hit ratio 100%
[31/Aug/2011:17:32:46 -0400] - import gted: Processed 121482 entries -- average rate 1065.6/sec, recent rate 1142.1/sec, hit ratio 100%
[31/Aug/2011:17:33:00 -0400] - import gted: Finished scanning file "/var/tmp/g.ldif" (135798 entries)
[31/Aug/2011:17:33:00 -0400] - import gted: Workers finished; cleaning up...
[31/Aug/2011:17:33:01 -0400] - import gted: Workers cleaned up.
[31/Aug/2011:17:33:01 -0400] - import gted: Cleaning up producer thread...
[31/Aug/2011:17:33:01 -0400] - import gted: Indexing complete.  Post-processing...
[31/Aug/2011:17:33:02 -0400] - import gted: Flushing caches...
[31/Aug/2011:17:33:02 -0400] - import gted: Closing files...
[31/Aug/2011:17:34:25 -0400] - All database threads now stopped
[31/Aug/2011:17:34:25 -0400] - import gted: Import complete.  Processed 135798 entries in 217 seconds. (625.80 entries/sec)
[31/Aug/2011:17:34:26 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[31/Aug/2011:17:34:26 -0400] - check_and_set_import_cache: pagesize: 4096, pages: 6171388, procpages: 93673
[31/Aug/2011:17:34:26 -0400] - Import allocates 9874220KB import cache.
[31/Aug/2011:17:34:26 -0400] - Setting ncache to: 3 to keep each chunk below 4Gbytes
[31/Aug/2011:17:34:26 -0400] - import accounts: Beginning import job...
[31/Aug/2011:17:34:26 -0400] - import accounts: Index buffering enabled with bucket size 100
[31/Aug/2011:17:34:26 -0400] - import accounts: Processing file "/var/tmp/a.ldif"
[31/Aug/2011:17:34:48 -0400] - import accounts: Processed 20160 entries -- average rate 916.4/sec, recent rate 916.3/sec, hit ratio 0%
[31/Aug/2011:17:35:10 -0400] - import accounts: Processed 39937 entries -- average rate 907.7/sec, recent rate 907.6/sec, hit ratio 100%

:
(deleted lots of the same)
:
[31/Aug/2011:17:40:01 -0400] - import accounts: Processed 299307 entries -- average rate 893.5/sec, recent rate 822.5/sec, hit ratio 100%
[31/Aug/2011:17:40:18 -0400] - import accounts: Finished scanning file "/var/tmp/a.ldif" (311535 entries)
[31/Aug/2011:17:40:18 -0400] - import accounts: Workers finished; cleaning up...
[31/Aug/2011:17:40:18 -0400] - import accounts: Workers cleaned up.
[31/Aug/2011:17:40:18 -0400] - import accounts: Cleaning up producer thread...
[31/Aug/2011:17:40:18 -0400] - import accounts: Indexing complete.  Post-processing...
[31/Aug/2011:17:40:20 -0400] - import accounts: Flushing caches...
[31/Aug/2011:17:40:20 -0400] - import accounts: Closing files...
[31/Aug/2011:17:45:30 -0400] - All database threads now stopped
[31/Aug/2011:17:45:30 -0400] - import accounts: Import complete.  Processed 311535 entries in 664 seconds. (469.18 entries/sec)
[31/Aug/2011:17:45:31 -0400] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[31/Aug/2011:17:45:31 -0400] - check_and_set_import_cache: pagesize: 4096, pages: 6171388, procpages: 93673
[31/Aug/2011:17:45:31 -0400] - Import allocates 9874220KB import cache.
[31/Aug/2011:17:45:31 -0400] - Setting ncache to: 3 to keep each chunk below 4Gbytes
[31/Aug/2011:17:45:32 -0400] - import people: Beginning import job...
[31/Aug/2011:17:45:32 -0400] - import people: Index buffering enabled with bucket size 100
[31/Aug/2011:17:45:32 -0400] - import people: Processing file "/var/tmp/p.ldif"
[31/Aug/2011:17:45:54 -0400] - import people: Processed 29283 entries -- average rate 1331.0/sec, recent rate 1331.0/sec, hit ratio 0%
[31/Aug/2011:17:46:15 -0400] - import people: Processed 56123 entries -- average rate 1305.2/sec, recent rate 1305.2/sec, hit ratio 100%
:
(deleted lots of the same)
:
[31/Aug/2011:18:03:11 -0400] - import people: Processed 1904597 entries -- average rate 1798.5/sec, recent rate 1271.2/sec, hit ratio 100%
[31/Aug/2011:18:03:33 -0400] - import people: Processed 1933007 entries -- average rate 1788.2/sec, recent rate 1279.3/sec, hit ratio 100%
[31/Aug/2011:18:03:40 -0400] - import people: Finished scanning file "/var/tmp/p.ldif" (1944503 entries)
[31/Aug/2011:18:03:41 -0400] - import people: Workers finished; cleaning up...
[31/Aug/2011:18:03:41 -0400] - import people: Workers cleaned up.
[31/Aug/2011:18:03:41 -0400] - import people: Cleaning up producer thread...
[31/Aug/2011:18:03:41 -0400] - import people: Indexing complete.  Post-processing...
[31/Aug/2011:18:03:46 -0400] - import people: Flushing caches...
[31/Aug/2011:18:03:46 -0400] - import people: Closing files...
[31/Aug/2011:18:14:20 -0400] - All database threads now stopped
[31/Aug/2011:18:14:20 -0400] - import people: Import complete.  Processed 1944503 entries in 1728 seconds. (1125.29 entries/sec)
briggs.iam.gatech.edu: 

and this is the current error log:

briggs.iam.gatech.edu: cat errors
	389-Directory/1.2.8.3 B2011.122.1636
	briggs.iam.gatech.edu:636 (/etc/dirsrv/slapd-briggs)

[01/Sep/2011:10:04:47 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[01/Sep/2011:10:04:47 -0400] - cache autosizing. found 24685552k physical memory
[01/Sep/2011:10:04:47 -0400] - cache autosizing: db cache: 7405664k, each entry cache (4 total): 1481132k
[01/Sep/2011:10:04:47 -0400] - I'm resizing my cache now...cache was 1521266688 and is now 3288432640
[01/Sep/2011:10:09:25 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[01/Sep/2011:10:09:25 -0400] - Listening on All Interfaces port 636 for LDAPS requests
[01/Sep/2011:10:25:10 -0400] NSMMReplicationPlugin - conn=402 op=3 replica="ou=people,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:10 -0400] NSMMReplicationPlugin - conn=401 op=3 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:10 -0400] NSMMReplicationPlugin - conn=403 op=3 replica="dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:13 -0400] NSMMReplicationPlugin - conn=403 op=4 replica="dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:13 -0400] NSMMReplicationPlugin - conn=402 op=4 replica="ou=people,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:13 -0400] NSMMReplicationPlugin - conn=401 op=4 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:19 -0400] NSMMReplicationPlugin - conn=402 op=5 replica="ou=people,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:20 -0400] NSMMReplicationPlugin - conn=401 op=5 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:21 -0400] NSMMReplicationPlugin - conn=403 op=5 replica="dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:31 -0400] NSMMReplicationPlugin - conn=401 op=7 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:31 -0400] NSMMReplicationPlugin - conn=402 op=6 replica="ou=people,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:31 -0400] NSMMReplicationPlugin - conn=403 op=7 replica="dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:56 -0400] NSMMReplicationPlugin - conn=401 op=9 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:56 -0400] NSMMReplicationPlugin - conn=403 op=9 replica="dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:25:56 -0400] NSMMReplicationPlugin - conn=402 op=8 replica="ou=people,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:26:43 -0400] NSMMReplicationPlugin - conn=401 op=10 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:26:45 -0400] NSMMReplicationPlugin - conn=402 op=10 replica="ou=people,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:10:26:45 -0400] NSMMReplicationPlugin - conn=403 op=10 replica="dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: permission denied
[01/Sep/2011:13:04:34 -0400] - slapd shutting down - signaling operation threads
[01/Sep/2011:13:04:34 -0400] - slapd shutting down - closing down internal subsystems and plugins
[01/Sep/2011:13:04:35 -0400] - Waiting for 4 database threads to stop
[01/Sep/2011:13:04:36 -0400] - All database threads now stopped
[01/Sep/2011:13:04:36 -0400] - slapd stopped.
[01/Sep/2011:13:05:21 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[01/Sep/2011:13:05:22 -0400] - cache autosizing. found 24685552k physical memory
[01/Sep/2011:13:05:22 -0400] - cache autosizing: db cache: 7405664k, each entry cache (4 total): 1481132k
[01/Sep/2011:13:05:22 -0400] - I'm resizing my cache now...cache was 3288432640 and is now 3288432640
[01/Sep/2011:13:13:49 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e5ff459000000320000) > changelog (4e5ff459000000320000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[01/Sep/2011:13:13:53 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e5ff469000100330000) > changelog (4e5ff469000100330000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[01/Sep/2011:13:13:57 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e5ff460000000340000) > changelog (4e5ff460000000340000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[01/Sep/2011:13:14:03 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[01/Sep/2011:13:14:04 -0400] - Listening on All Interfaces port 636 for LDAPS requests
briggs.iam.gatech.edu:

Comment 10 John Bryson 2011-09-01 19:15:05 UTC

(the the permission-denied errors are when Robert set up a replication agreement and typoed the password. )

Comment 11 Robert Viduya 2011-09-06 17:00:11 UTC

I should mention that we used the term "upgrade" loosely.  What I've done is to delete the old version of the directory server, including all the database files and configurations and then installed the latest version.  Once the directory server was all set up and configured, I then imported the ldif files from one of the masters into a clean database.

It's disturbing to me that after a clean install and import, we get this error right away.

Comment 12 Rich Megginson 2011-09-06 18:32:23 UTC

(In reply to comment #11)
> I should mention that we used the term "upgrade" loosely.  What I've done is to
> delete the old version of the directory server, including all the database
> files and configurations

Including the changelog database directory and files?

> and then installed the latest version.  Once the
> directory server was all set up and configured, I then imported the ldif files
> from one of the masters into a clean database.
> 
> It's disturbing to me that after a clean install and import, we get this error
> right away.

Comment 13 Robert Viduya 2011-09-06 18:41:34 UTC

Yes.   I used the following command lines to remove the old directory server:

# yum remove 389-admin-console-doc 389-admin-console 389-admin 389-adminutil 389-console \
389-ds-base-devel 389-ds-base 389-ds-console-doc 389-ds-console 389-ds 389-dsgw \
idm-console-framework

# rm -r /usr/lib64/dirsrv /usr/share/dirsrv /etc/sysconfig/dirsrv-admin.rpmsave \
/etc/sysconfig/dirsrv.rpmnew /etc/sysconfig/dirsrv.rpmsave /etc/dirsrv /var/run/dirsrv \
/var/log/dirsrv /var/lib/dirsrv /var/cache/yum/dirsrv-noarch /var/cache/yum/dirsrv /var/lock/dirsrv

I'm pretty sure that got everything.

Comment 14 Rich Megginson 2011-09-06 19:18:06 UTC

(In reply to comment #13)
> Yes.   I used the following command lines to remove the old directory server:
> 
> # yum remove 389-admin-console-doc 389-admin-console 389-admin 389-adminutil
> 389-console \
> 389-ds-base-devel 389-ds-base 389-ds-console-doc 389-ds-console 389-ds 389-dsgw
> \
> idm-console-framework
> 
> # rm -r /usr/lib64/dirsrv /usr/share/dirsrv /etc/sysconfig/dirsrv-admin.rpmsave
> \
> /etc/sysconfig/dirsrv.rpmnew /etc/sysconfig/dirsrv.rpmsave /etc/dirsrv
> /var/run/dirsrv \
> /var/log/dirsrv /var/lib/dirsrv /var/cache/yum/dirsrv-noarch
> /var/cache/yum/dirsrv /var/lock/dirsrv
> 
> I'm pretty sure that got everything.

Yes, that would do it.

I realize the error message is distressing, but is there any data loss?  One idea is that the changelog creation may happen very early - then, when the initialization happens, the replication code says "hey, there is a changelog db here, and it may contain data, and I can't use it any more since I'm being initialized, so anything in there already will be wiped out".

Comment 15 Robert Viduya 2011-09-07 17:44:25 UTC

No, that's not what is happening.  The changelog is getting recreated _every_ time the server is stopped and restarted.  _Every_ time, without fail, on 2 of our masters and our 2 hubs.  This means that any data that made it into the changelog but didn't get propagated to all the consumers downstream when the server is stopped is lost.  If that's not data loss, I don't know what is.

Just to illustrate, I just stopped and restarted the server again.  This is what's in the logs:

[07/Sep/2011:13:28:51 -0400] - 389-Directory/1.2.8.3 B2011.122.1636 starting up
[07/Sep/2011:13:28:52 -0400] - cache autosizing. found 24685552k physical memory
[07/Sep/2011:13:28:52 -0400] - cache autosizing: db cache: 7405664k, each entry cache (4 total): 1481132k
[07/Sep/2011:13:28:52 -0400] - I'm resizing my cache now...cache was 3288432640 and is now 3288432640
[07/Sep/2011:13:32:05 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e67e1cd000000320000) > changelog (4e67e1cd000000320000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[07/Sep/2011:13:32:06 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e67e1ce000000330000) > changelog (4e67e1ce000000330000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[07/Sep/2011:13:32:06 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e67e1cb000000340000) > changelog (4e67e1cb000000340000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[07/Sep/2011:13:32:07 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[07/Sep/2011:13:32:07 -0400] - Listening on All Interfaces port 636 for LDAPS requests

We hoped that upgrading from 1.2.2. to 1.2.8.3 would fix the problem, but it clearly hasn't.

We are currently working around this problem by ensuring that at least one master and one hub is up at all times, to ensure all changes get propagated downstream.  But one power outage and we're faced with having to rebuild all our slave servers.

Comment 16 Rich Megginson 2011-09-07 19:23:00 UTC

What platform is this?  EL5?  32-bit or 64-bit?

I'm trying to reproduce this problem with the latest code, but simply setting up a 2 master + 1 hub scenario doesn't give the error.

1) use setup-ds.pl to create instance for master m1, master m2, and hub
2) load m1 with the example data file
3) setup m1 and m2 as masters, set up hub as a replica hub
4) create replication agreements from m1 to m2, m2 to m1, m1 to hub, m2 to hub
5) do a replica init from m1 to m2
6) do a replica init from m1 to hub
7) do some modify operations on m1 and m2 to verify replication is working and updates reach the hub
8) restart the servers, one at a time

no errors are seen

I'll keep trying different things, but in the meantime, any additional information you could provide about how I can reproduce this problem would be appreciated.

Comment 17 Robert Viduya 2011-09-07 19:40:43 UTC

We're running RHEL5, 64-bit.

Are the CSNs being printed out in the error lines significant?  I noticed that those are new, the older version of 389 wasn't printing those out.

Comment 18 Rich Megginson 2011-09-07 19:57:40 UTC

(In reply to comment #17)
> We're running RHEL5, 64-bit.
> 
> Are the CSNs being printed out in the error lines significant?  I noticed that
> those are new, the older version of 389 wasn't printing those out.

I'm not sure what you mean by significant.  I believe they were added at some point to help with debugging this issue.

Comment 19 Rich Megginson 2011-09-07 20:17:48 UTC

on a server that has the problem, before you shut it down, do this:
ldapsearch -xLLL -D "cn=directory manager" -w password -s sub -b dc=gted,dc=gatech,dc=edu objectclass=nsTombstone

and

cl-dump -D "cn=directory manager" -w password > /tmp/changelogdump.ldif

I'm especially interested in how the nsds50ruv and nsruvReplicaLastModified match up to the clpurgeruv and clmaxruv in the changelog dump.

Comment 20 Robert Viduya 2011-09-07 20:56:53 UTC

Created attachment 522005 [details]
output from cl-dump

Comment 21 Robert Viduya 2011-09-07 20:57:37 UTC

Created attachment 522006 [details]
output from ldapsearch objectclass=nstombstone

Comment 22 Robert Viduya 2011-09-07 21:02:46 UTC

By significant, I mean that the CSNs printed out in the error log file are reported as being the same between the replica data and the changelog.  But the error is that they don't match.

I did the following commands and attached the output from them.  I removed anything in the ldif files that looked like private data that shouldn't be shared; if you need to see them, we'll have to arrange some other method of getting them to you.

# ldapsearch -xLLL -D "cn=directory manager" -w ________ -s sub -b dc=gted,dc=gatech,dc=edu objectclass=nsTombstone > /var/tmp/nst.ldif
# cl-dump -D "cn=directory manager" -w ________ > /var/tmp/changelogdump.ldif
# /etc/init.d/dirsrv stop

The error log had the following in it after restarting, in case you needed those CSNs:

[07/Sep/2011:16:47:55 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e6810ea000000320000) > changelog (4e6810ea000000320000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[07/Sep/2011:16:47:56 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e6810eb000000330000) > changelog (4e6810eb000000330000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[07/Sep/2011:16:47:56 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica ou=people,dc=gted,dc=gatech,dc=edu does not match the data in the changelog (replica data (4e6810f8000000340000) > changelog (4e6810f8000000340000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.

Comment 23 Rich Megginson 2011-09-07 21:43:27 UTC

Excellent, thank you sir.  I was able to reproduce the problem.  Steps to reproduce:
1) set up replication with 2 masters - add entries to verify replication is working, and to populate the changelog db (i.e. make sure it has clmaxruv)
2) on m1, db2ldif -n userRoot -r -a /tmp/userRoot.ldif
3) add a bogus nsds50ruv - you can just copy the first element and change it
- make sure the replica ID and host:port are unique
- make sure the replica ID in the min and max csn match your chosen replica ID
- make sure the timestamp portion of the min and max csns are less than any of the real timestamp portion of any of the other min and max csns in the real ruv elements
4) ldif2db -n userRoot -i /tmp/userRoot.ldif
5) start m1

In the errors log on m1 you will see messages like this:
[07/Sep/2011:15:26:27 -0600] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=example,dc=com does not match the data in the changelog (replica data (4e67cd76000000010000) > changelog (4e67cd76000000010000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.

Where 4e67cd76000000010000 is a real, valid max csn from one of the real, valid ruv elements.  The error message is misleading because it is not 4e67cd76000000010000 or any ruv which contains 4e67cd76000000010000 which is the real problem.

The problem is due to these extra elements in the database RUV that do not match the RUV in the changelog. The changelog is missing these elements:

nsds50ruv: {replica 56 ldap://treilhard.iam.gatech.edu:389} 4af8d9900000003800
 00 4bbab41a000000380000
nsds50ruv: {replica 61 ldap://gtedm1.iam.gatech.edu:389} 4bbc64c00000003d0000 
 4bdeee570000003d0000
nsds50ruv: {replica 64 ldap://gtedm2.iam.gatech.edu:389} 4bbe07eb000000400000 
 4be30006000000400000

The function replica_check_for_data_reload looks through all of the RUV elements to see if the changelog has them.  Because the changelog does not have these elements, the function thinks the database is newer than the changelog, and the changelog is therefore invalid.  The errors log message is misleading, in that it doesn't give you this information.  It stupidly just finds the max csn of all of the maxcsns in the clmaxruv, and the max csn of all the maxcsns in the nsds50ruv, and prints those out.

One thing you could try is this:
1) for each database named dbname, do db2ldif -n dbname -r -a /tmp/dbname.ldif
2) for each ldif file, edit the nsds50ruv to remove the obsolete elements
3) for each database named dbname, do ldif2db -n dbname -i /tmp/dbname.ldif to reload the database from ldif

You will have to initialize the other masters, hubs, and consumers from this new, clean database

Comment 24 Rich Megginson 2011-09-07 21:59:34 UTC

And I apologize for not getting on this earlier - I was not aware that this problem could lead to data loss until https://bugzilla.redhat.com/show_bug.cgi?id=590826#c15

Comment 25 Rich Megginson 2011-09-07 23:11:45 UTC

Another data point - attempting to use ldapmodify to fix the ruv deadlocks the server e.g.
ldapmodify -x -h localhost -p 1389 -D "cn=directory manager" -w password <<EOF
dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=example,dc=com
changetype: modify
delete: nsds50ruv
nsds50ruv: {replica 3 ldap://localhost.localdomain:3389} 3e67cd37000000030000 3e67cd76000000030000
EOF

This causes a deadlock here:
#0  0x000000305280dfe4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000305280934e in _L_lock_995 () from /lib64/libpthread.so.0
#2  0x00000030528092b6 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000003061023bd9 in PR_Lock (lock=0x901d20)
    at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:206
#4  0x00007f24b9f512a3 in cache_lock_entry (cache=0x96d308, e=0xae0ed0)
    at ../ds.git/ldap/servers/slapd/back-ldbm/cache.c:1444
#5  0x00007f24b9f664fa in find_entry_internal_uniqueid (pb=0xae69f0, 
    be=0x96b010, uniqueid=0xae1f00 "ffffffff-ffffffff-ffffffff-ffffffff", 
    lock=1, txn=0x0) at ../ds.git/ldap/servers/slapd/back-ldbm/findentry.c:228
#6  0x00007f24b9f66723 in find_entry_internal (pb=0xae69f0, be=0x96b010, 
    addr=0x7f24afcef900, lock=1, txn=0x0, really_internal=1)
    at ../ds.git/ldap/servers/slapd/back-ldbm/findentry.c:275
#7  0x00007f24b9f668d8 in find_entry2modify_only (pb=0xae69f0, be=0x96b010, 
    addr=0x7f24afcef900, txn=0x0)
    at ../ds.git/ldap/servers/slapd/back-ldbm/findentry.c:337
#8  0x00007f24b9fb217f in ldbm_txn_ruv_modify_context (pb=0xae69f0, 
    mc=0x7f24afcef980) at ../ds.git/ldap/servers/slapd/back-ldbm/misc.c:431
#9  0x00007f24b9f9e741 in ldbm_back_modify (pb=0xae69f0)
    at ../ds.git/ldap/servers/slapd/back-ldbm/ldbm_modify.c:390
#10 0x00007f24be0bd2e4 in op_shared_modify (pb=0xae69f0, pw_change=0, 
    old_pw=0x0) at ../ds.git/ldap/servers/slapd/modify.c:888
#11 0x00007f24be0bc06d in do_modify (pb=0xae69f0)
    at ../ds.git/ldap/servers/slapd/modify.c:384
#12 0x0000000000413f17 in connection_dispatch_operation (conn=0x7f24b06fd410, 
    op=0xac9200, pb=0xae69f0) at ../ds.git/ldap/servers/slapd/connection.c:583
#13 0x00000000004158e4 in connection_threadmain ()
    at ../ds.git/ldap/servers/slapd/connection.c:2328
#14 0x0000003061029633 in _pt_root (arg=0xadbf70)
    at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:187
#15 0x00000030528077e1 in start_thread () from /lib64/libpthread.so.0
#16 0x0000003051ce577d in clone () from /lib64/libc.so.6

The entry has already been locked for the modify, and ldbm_txn_ruv_modify_context attempts to lock it again.

Comment 26 Robert Viduya 2011-09-08 14:48:40 UTC

Those three entries, treilhard, gtedm1 and gtedm2, are from old masters that have long been retired.  What we've been doing whenever we need to swap out hardware for newer machines is to bring the new machine in, set it up with a new name, install the software and set up replication agreements to it.  Then we run it in parallel with the old machine for a week or more until we're comfortable.  Then we shutdown the old server and remove any replication agreements to it.

Obviously this is leaving traces of the old server around and that's causing problems.  However, there's no documented procedure for removing an old master server from the pool.

Your suggested procedure of exporting, modifying the nsds50ruv entry and then reimporting and re-initializing all downstream servers might work, but if we're going to do that, we may as well just delete the entry entirely and clean all other internal information from the ldif file (CSNs and internal attributes) and then do a clean import.  As long as we have to re-initialize all downstream servers, it's essentially the same amount of work.

We'll have to look into scheduling a time do that.  But it's still a lot of work to do every time we retire a master server.

Comment 27 Rich Megginson 2011-09-08 15:14:08 UTC

(In reply to comment #26)
> Those three entries, treilhard, gtedm1 and gtedm2, are from old masters that
> have long been retired.  What we've been doing whenever we need to swap out
> hardware for newer machines is to bring the new machine in, set it up with a
> new name, install the software and set up replication agreements to it.  Then
> we run it in parallel with the old machine for a week or more until we're
> comfortable.  Then we shutdown the old server and remove any replication
> agreements to it.
> 
> Obviously this is leaving traces of the old server around and that's causing
> problems.  However, there's no documented procedure for removing an old master
> server from the pool.
> 
> Your suggested procedure of exporting, modifying the nsds50ruv entry and then
> reimporting and re-initializing all downstream servers might work, but if we're
> going to do that, we may as well just delete the entry entirely and clean all
> other internal information from the ldif file (CSNs and internal attributes)
> and then do a clean import.  As long as we have to re-initialize all downstream
> servers, it's essentially the same amount of work.
> 
> We'll have to look into scheduling a time do that.  But it's still a lot of
> work to do every time we retire a master server.

Right.  Note that if you use db2ldif -r and deal with the LDIF with the embedded replication information, you might not need to re-initialize all downstream servers.  However, you will need to do the export/fix/reimport procedure on all masters and hubs to avoid the "wipe out changelog at startup" problem.

We're working on a solution for both of these problems.

we're going to fix this bug so that if it hits this condition it will just warn that there are entries in the ruv that do not match the entries in the changelog, and will not wipe out the changelog

https://bugzilla.redhat.com/show_bug.cgi?id=736712 will make it easy to use ldapmodify to remove the obsolete entries from the ruv

Comment 28 Rich Megginson 2011-09-09 16:19:04 UTC

Created attachment 522361 [details]
0001-Bug-590826-Reloading-database-from-ldif-causes-chang.patch

Comment 29 Rich Megginson 2011-09-09 16:22:08 UTC

You can use the CLEANRUV task to remove the obsolete elements from the database RUV.  See http://directory.fedoraproject.org/wiki/Howto:CLEANRUV for more information.

Comment 30 Noriko Hosoi 2011-09-09 16:43:29 UTC

Comment on attachment 522361 [details]
0001-Bug-590826-Reloading-database-from-ldif-causes-chang.patch

It would be nice if you put short comments here to distinguish these 2 for future reference... :)
+enum 
+{
+	RUV_COMP_RUV1_MISSING,
+	RUV_COMP_RUV2_MISSING

Comment 31 Rich Megginson 2011-09-09 19:00:50 UTC

Created attachment 522385 [details]
0001-Bug-590826-Reloading-database-from-ldif-causes-chang.patch

updated patch with comments

Comment 32 Rich Megginson 2011-09-09 19:43:51 UTC

To ssh://git.fedorahosted.org/git/389/ds.git
   379c164..6bac1a7  master -> master
commit 6bac1a7876ddd2a1fe986505f16aa0330ab4a671
Author: Rich Megginson <rmeggins>
Date:   Thu Sep 8 20:03:14 2011 -0600
    Reviewed by: nkinder, nhosoi (Thanks!)
    Branch: master
    Fix Description: When there are obsolete or decommissioned masters in the
    RUV, this should not invalidate the changelog.  Instead, warn the user that
    there are replicas in the database RUV that are not in the changelog
    max RUV.  In this case, the CLEANRUV task can be used to remove the
    obsolete masters from the database RUV.  I had to add a function to
    generate a string representation of the replica RUVElement* for logging
    purposes, and used that function elsewhere.  The new function
    ruv_compare_ruv should be used instead of ruv_contains_ruv since it gives
    more flexibility about logging and handling different cases.
    Platforms tested: RHEL6 x86_64
    Flag Day: no
    Doc impact: no

Comment 33 Rich Megginson 2012-01-10 20:18:57 UTC

Upstream ticket:
https://fedorahosted.org/389/ticket/248

Note You need to log in before you can comment on or make changes to this bug.