Bug 681617

Summary: Incomplete replication leaves state in dse.ldif
Product: [Retired] 389 Reporter: Martin Poole <mpoole>
Component: Directory ServerAssignee: Rich Megginson <rmeggins>
Status: CLOSED NOTABUG QA Contact: Chandrasekar Kannan <ckannan>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 1.2.8CC: benl, ckannan, nhosoi, nkinder, rmeggins
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-08 17:33:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 512820, 690319    

Description Martin Poole 2011-03-02 18:29:02 UTC
Description of problem:

A replication initialisation which did not complete left an attribute in the agreement which confused the server on subsequent startups.

The replication agreement in the dse.ldif contains

nsds5BeginReplicaRefresh: start

and on startup ...


[01/Mar/2011:15:25:41 +0000] - Config Warning: - nsslapd-maxdescriptors: invalid value "8192", maximum file descriptors must range from 1 to 1024 (the current process limit).  Server will use a setting of 1024.
[01/Mar/2011:15:25:42 +0000] - Red Hat-Directory/8.2.4 B2011.028.1837 starting up
[01/Mar/2011:15:25:43 +0000] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[01/Mar/2011:15:25:43 +0000] - Listening on All Interfaces port 636 for LDAPS requests
[01/Mar/2011:15:30:06 +0000] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=Rep2dir01" (dir01:389)" can not be updated while the replica is disabled

Comment 1 Martin Kosek 2012-01-04 13:28:03 UTC
Upstream ticket:
https://fedorahosted.org/389/ticket/63

Comment 2 Noriko Hosoi 2012-10-15 23:02:32 UTC
I cannot reproduce the problem...

To stop the consumer initialization, usually, "nsds5beginreplicarefresh: cancel" is set to the agreement. The attribute value is left in the agreement, but it does not do any harm.

I manually replace "cancel" with "start" and restarted the server. Then, the consumer initialization is automatically restarted and completed.
.. - slapd started. Listening on All Interfaces port 389 for LDAP requests
.. NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=aaa" (hostB:390)".
.. NSMMReplicationPlugin - Finished total update of replica "agmt="cn=aaa" (hostB:390)". Sent 50002 entries.

Could you please tell us how to leave the state in the config file? 
  nsds5BeginReplicaRefresh: start

Comment 3 Rich Megginson 2012-10-15 23:05:28 UTC
(In reply to comment #2)
> I cannot reproduce the problem...
> 
> To stop the consumer initialization, usually, "nsds5beginreplicarefresh:
> cancel" is set to the agreement. The attribute value is left in the
> agreement, but it does not do any harm.
> 
> I manually replace "cancel" with "start" and restarted the server. Then, the
> consumer initialization is automatically restarted and completed.
> .. - slapd started. Listening on All Interfaces port 389 for LDAP requests
> .. NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=aaa"
> (hostB:390)".
> .. NSMMReplicationPlugin - Finished total update of replica "agmt="cn=aaa"
> (hostB:390)". Sent 50002 entries.
> 
> Could you please tell us how to leave the state in the config file? 
>   nsds5BeginReplicaRefresh: start

Note in the original problem:
> [01/Mar/2011:15:30:06 +0000] NSMMReplicationPlugin - Total update aborted: Replication agreement for "agmt="cn=Rep2dir01" (dir01:389)" can not be updated while the replica is disabled

Maybe the consumer has to be in the disabled state?

Comment 4 Noriko Hosoi 2012-10-15 23:31:40 UTC
(In reply to comment #3)
> Maybe the consumer has to be in the disabled state?

1) the consumer is down; restarting the master logs:

[..] slapi_ldap_bind - Error: could not send bind request for id [cn=directory manager] mech [SIMPLE]: error -1 (Can't contact LDAP server) -5987 (Invalid function argument.) 107 (Transport endpoint is not connected)
[..] NSMMReplicationPlugin - agmt="cn=aaa" (hostB:390): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
[..] - slapd started.  Listening on All Interfaces port 389 for LDAP requests

==> I think this is the expected behaviour.

2) disabled the consumer replica; restarting the master logs:

[..] NSMMReplicationPlugin - agmt="cn=aaa" (hostB:390): Unable to acquire replica: there is no replicated area "dc=example,dc=com" on the consumer server. Replication is aborting.

==> I think this is the expected behaviour, too...

What else we could try?

Comment 5 Rich Megginson 2012-10-15 23:40:26 UTC
Hmm - don't know - maybe this is fixed in 1.2.11 and later?

Comment 6 Noriko Hosoi 2012-10-26 19:36:05 UTC
Hi Martin,

We are having a hard time to duplicate this bug.  Could you please give us the steps to reproduce?
Thanks!
--noriko

Comment 7 Noriko Hosoi 2012-11-08 17:33:12 UTC
Since no one in the team successfully duplicated the problem and there is no new input, we are closing this bug for now.  Please feel free to reopen this bug if someone runs into this bug.