Bug 233643

Summary: MMR breaks with time skew errors
Product: [Retired] 389 Reporter: Chris St. Pierre <cstpierr>
Component: Replication - GeneralAssignee: Rich Megginson <rmeggins>
Status: CLOSED DUPLICATE QA Contact: Orla Hegarty <ohegarty>
Severity: high Docs Contact:
Priority: medium    
Version: 1.0.3   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-23 15:54:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris St. Pierre 2007-03-23 15:41:53 UTC
Tuesday night (20 March), replication suddenly ceased between our four nodes, 
with errors similar to these on all nodes:

[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to gro
ucho (o=isp)"" (groucho:389): Unable to acquire replica: Excessive clock skew bet
ween the supplier and the consumer. Replication is aborting.
[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to gro
ucho (o=isp)"" (groucho:389): Incremental update failed and requires administrato
r action
[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zep
po.nebrwesleyan.edu (o=isp)"" (zeppo:389): Unable to acquire replica: Excessive c
lock skew between the supplier and the consumer. Replication is aborting.
[20/Mar/2007:17:13:26 -0500] NSMMReplicationPlugin - agmt="cn="Replication to zep
po.nebrwesleyan.edu (o=isp)"" (zeppo:389): Incremental update failed and requires
 administrator action
[20/Mar/2007:17:13:27 -0500] - csngen_adjust_time: adjustment limit exceeded; val
ue - 86401, limit - 86400
[20/Mar/2007:17:13:27 -0500] NSMMReplicationPlugin - conn=1600790 op=4983 replica
="o=isp": Unable to acquire replica: error: excessive clock skew
[20/Mar/2007:17:23:56 -0500] NSMMReplicationPlugin - agmt="cn="Replication to har
po.nebrwesleyan.edu (o=isp)"" (harpo:389): Unable to acquire replica: Excessive c
lock skew between the supplier and the consumer. Replication is aborting.
[20/Mar/2007:17:23:56 -0500] NSMMReplicationPlugin - agmt="cn="Replication to har
po.nebrwesleyan.edu (o=isp)"" (harpo:389): Incremental update failed and requires
 administrator action
[20/Mar/2007:17:58:27 -0500] - csngen_adjust_time: adjustment limit exceeded; val
ue - 86401, limit - 86400
[20/Mar/2007:17:58:27 -0500] NSMMReplicationPlugin - conn=1615833 op=3276 replica
="o=isp": Unable to acquire replica: error: excessive clock skew

Subsequent efforts to resume replication have come to naught.  (See this thread: 
https://www.redhat.com/archives/fedora-directory-users/2007-March/msg00100.html 
for more details on getting replication working again at all.)  Once I get 
replication working, a few minutes after putting the cluster back into production 
we get the same error messages and replication ceases.  The messages do not occur 
when the machines are not actively getting queries.

In all cases, the 'value' on the csngen_adjust_time line is 86401 and the limit 
is 86400.  All four nodes have the same clock times, and are running NTP against 
a local NTP server.

We are using four-way MMR with no read-only replicas.  We used the mmr.pl script 
to set up replication.  Each node replicates to all three other nodes, and all 
four nodes receive updates.

We have another database on all nodes that has continued to replicate without a 
problem.  It's only our "o=isp" base that has troubles.

This guy seems to have encountered a similar issue: http://www.mail-archive.com/
fedora-directory-users/msg03614.html

Comment 1 Rich Megginson 2007-03-23 15:54:37 UTC

*** This bug has been marked as a duplicate of 233642 ***