Bug 233643

Summary:	MMR breaks with time skew errors
Product:	[Retired] 389	Reporter:	Chris St. Pierre <cstpierr>
Component:	Replication - General	Assignee:	Rich Megginson <rmeggins>
Status:	CLOSED DUPLICATE	QA Contact:	Orla Hegarty <ohegarty>
Severity:	high	Docs Contact:
Priority:	medium
Version:	1.0.3
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-03-23 15:54:37 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chris St. Pierre 2007-03-23 15:41:53 UTC

Tuesday night (20 March), replication suddenly ceased between our four nodes, 
with errors similar to these on all nodes:

[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to gro
ucho (o=isp)"" (groucho:389): Unable to acquire replica: Excessive clock skew bet
ween the supplier and the consumer. Replication is aborting.
[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to gro
ucho (o=isp)"" (groucho:389): Incremental update failed and requires administrato
r action
[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zep
po.nebrwesleyan.edu (o=isp)"" (zeppo:389): Unable to acquire replica: Excessive c
lock skew between the supplier and the consumer. Replication is aborting.
[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zep
po.nebrwesleyan.edu (o=isp)"" (zeppo:389): Incremental update failed and requires
 administrator action
[20/Mar/2007:17:13:27 -0500] - csngen_adjust_time: adjustment limit exceeded; val
ue - 86401, limit - 86400
[20/Mar/2007:17:13:27 -0500] NSMMReplicationPlugin - conn=1600790 op=4983 replica
="o=isp": Unable to acquire replica: error: excessive clock skew
[20/Mar/2007:17:23:56 -0500] NSMMReplicationPlugin - agmt="cn="Replication to har
po.nebrwesleyan.edu (o=isp)"" (harpo:389): Unable to acquire replica: Excessive c
lock skew between the supplier and the consumer. Replication is aborting.
[20/Mar/2007:17:23:56 -0500] NSMMReplicationPlugin - agmt="cn="Replication to har
po.nebrwesleyan.edu (o=isp)"" (harpo:389): Incremental update failed and requires
 administrator action
[20/Mar/2007:17:58:27 -0500] - csngen_adjust_time: adjustment limit exceeded; val
ue - 86401, limit - 86400
[20/Mar/2007:17:58:27 -0500] NSMMReplicationPlugin - conn=1615833 op=3276 replica
="o=isp": Unable to acquire replica: error: excessive clock skew

Subsequent efforts to resume replication have come to naught.  (See this thread: 
https://www.redhat.com/archives/fedora-directory-users/2007-March/msg00100.html 
for more details on getting replication working again at all.)  Once I get 
replication working, a few minutes after putting the cluster back into production 
we get the same error messages and replication ceases.  The messages do not occur 
when the machines are not actively getting queries.

In all cases, the 'value' on the csngen_adjust_time line is 86401 and the limit 
is 86400.  All four nodes have the same clock times, and are running NTP against 
a local NTP server.

We are using four-way MMR with no read-only replicas.  We used the mmr.pl script 
to set up replication.  Each node replicates to all three other nodes, and all 
four nodes receive updates.

We have another database on all nodes that has continued to replicate without a 
problem.  It's only our "o=isp" base that has troubles.

This guy seems to have encountered a similar issue: http://www.mail-archive.com/
fedora-directory-users/msg03614.html

Comment 1 Rich Megginson 2007-03-23 15:54:37 UTC


*** This bug has been marked as a duplicate of 233642 ***