Bug 442560

Summary:	Account Lockout Attributes replication is attempted despite no configuration allowing it
Product:	[Retired] 389	Reporter:	Aleksander Adamowski <bugs-redhat>
Component:	Replication - General	Assignee:	Rich Megginson <rmeggins>
Status:	CLOSED DUPLICATE	QA Contact:	Chandrasekar Kannan <ckannan>
Severity:	low	Docs Contact:
Priority:	low
Version:	1.1.0	CC:	benl
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-06-23 23:12:02 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	249650

Description Aleksander Adamowski 2008-04-15 15:09:26 UTC

Description of problem:

I'm testing fedora-ds 1.1.0 on RHEL5 with multi-master replication (3 master nodes).

After activating account lockout features on all 3 nodes, I've started getting
replication errors in the logs and incremental replication got stuck.
Re-initializing other master consumers from a chosen master temporarily solves
the problem, until another authentication error occurs, triggering replication
of account lockout attribute change.

The account lockout attributes are being replicated despite no configuration
baing in place that would allow it (according to
http://www.redhat.com/docs/manuals/dir-server/ag/8.0/Managing_Replication-Replicating-Password-Attributes.html,
"passwordIsGlobalPolicy" configuration attribute is needed to turn it on).

The errors I can see in the log are:

[15/Apr/2008:08:02:17 +0200] NSMMReplicationPlugin - agmt="cn=with_directory2"
(directory2:636): Consumer failed to replay change (uniqueid
a3ed6e40-088211dd-9122b219-421e6ead, CSN 48046118000000030000): DSA is unwilling
to perform. Will retry later.
[15/Apr/2008:08:02:33 +0200] NSMMReplicationPlugin - agmt="cn=with_directory2"
(directory2:636): Consumer failed to replay change (uniqueid
a3ed6e40-088211dd-9122b219-421e6ead, CSN 48046118000000030000): DSA is unwilling
to perform. Will retry later.
[15/Apr/2008:08:02:33 +0200] NSMMReplicationPlugin - agmt="cn=with_directory1"
(directory1:636): Consumer failed to replay change (uniqueid
a3ed6e40-088211dd-9122b219-421e6ead, CSN 48046118000000030000): DSA is unwilling
to perform. Will retry later.
[15/Apr/2008:08:07:34 +0200] - repl5_inc_waitfor_async_results timed out waiting
for responses: 8 9
[15/Apr/2008:08:07:34 +0200] NSMMReplicationPlugin - agmt="cn=with_directory2"
(directory2:636): Warning: unable to receive endReplication extended operation
response (Bad parameter to an ldap routine)
[15/Apr/2008:08:07:34 +0200] - repl5_inc_waitfor_async_results timed out waiting
for responses: 10 11
[15/Apr/2008:08:07:34 +0200] NSMMReplicationPlugin - agmt="cn=with_directory1"
(directory1:636): Warning: unable to receive endReplication extended operation
response (Bad parameter to an ldap routine)

At this point it isn't obvious that account lockout attrs are replicated and are
the cause, but I've dumped the contents of db4 transaction log to track it down:

cp -a /var/lib/dirsrv/slapd-INSTANCENAME/changelogdb ~/SOME_TEMP_DIR/
cd ~/SOME_TEMP_DIR/changelogdb
db_printlog  > ../db_printlog_excerpt.txt
less -in ../db_printlog_excerpt.txt

In the db4 transaction log dump, I've found out the following entries that
correspond directly to the replication error messages (based on timestamp of the
 first error, uniqueid and CSN values):

[1][207753]__db_addrem: rec: 41 txnid 8000009d prevlsn [0][0]
	opcode: 1
	fileid: 0
	pgno: 16
	indx: 42
	nbytes: 24
	hdr: 
	dbt: 480461180000000300000 
	pagelsn: [1][207306]

[1][207838]__db_addrem: rec: 41 txnid 8000009d prevlsn [1][207753]
	opcode: 1
	fileid: 0
	pgno: 16
	indx: 43
	nbytes: 188
	hdr: 
	dbt: 0x5 0x8 H0x4 D0xe9 480461180000000300000
a3ed6e40-088211dd-9122b219-421e6ead0
uid=USERS_UID,l=LOCATION,ou=People,DIRECTORY_BASE_DN0 0 0 0 0x2 0x82
retryCountResetTime0 0 0 0 0x1 0 0 0 0xf 20080415061217Z0x82 passwordRetryCount0
0 0 0 0x1 0 0 0 0x1 1pass
	pagelsn: [1][207753]

[1][208085]__txn_regop: rec: 10 txnid 8000009d prevlsn [1][207838]
	opcode: 1
	timestamp: 1208239338 (Tue Apr 15 08:02:18 2008, 200804150802.18)
	locks: 

So it seems that Directory Server has attempted replicating the
retryCountResetTime and passwordRetryCount attributes, which contradicts the
documentation at
http://www.redhat.com/docs/manuals/dir-server/ag/8.0/Managing_Replication-Replicating-Password-Attributes.html.

Also, the documentation doesn't state that turning account lockout replication
on would cause problems with replication.




Version-Release number of selected component (if applicable):
fedora-ds-1.1.0-3.fc6.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Configure mutli-master replication
2. Activate account lockout on all nodes
3. Supply incorrect password durin simple authentication
  
Actual results:
retryCountResetTime and passwordRetryCount are sent to other replicas, causing
replication errors

Expected results:
1) retryCountResetTime and passwordRetryCount attributes shouldn't be replicated
by default
2) When they are replicated, they should be accepted by replicas without causing
errors and desynchronization of replicas.

Comment 1 Aleksander Adamowski 2008-04-15 15:33:50 UTC

Additional details:

If I set passwordIsGlobalPolicy to "on" (the documentation isn't correct WRT
Fedora Directory Server 1.1 - it has to be set to "on", not "1") on the
receiving replicas, then they accept the change and everything works fine
(although not consistent with documentation).

Here's the LDIF I use for this change:

dn: cn=config
changetype: modify
replace: passwordIsGlobalPolicy
passwordIsGlobalPolicy: on


If I try to turn passwordIsGlobalPolicy off on the sending replica (the server
to which the incorrect simple bind has been sent), it still tries to replicate
the passwordRetryCount change to other replicas.

So this behaviour cannot be turned off.

Here's the LDIF I use for this change on the sending replica:

dn: cn=config
changetype: modify
replace: passwordIsGlobalPolicy
passwordIsGlobalPolicy: off


The replicated change can also be seen on the receiving replicas, in their audit
logs (if these logs get enabled):

time: 20080415172816
dn: uid=USER_UID,l=SOME_LOCATION,ou=people,o=DIRECTORY_BASE_DN
changetype: modify
replace: passwordRetryCount
passwordRetryCount: 3
-


So diagnosing it doesn't require analyzing changlog's DB4 log dumps.

You just:

1) launch "tail -f /var/log/dirsrv/slapd-INSTANCENAME/audit" on one of the
receiving replicas
2) try to bind with a wrong password on the sending replica

And you'll see the change propagated on the receiving side.

Comment 2 Rich Megginson 2008-06-23 23:12:02 UTC

This

*** This bug has been marked as a duplicate of 450973 ***