Bug 1009122

Summary: replication stops with excessive clock skew
Product: Red Hat Enterprise Linux 6 Reporter: Rich Megginson <rmeggins>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Sankar Ramalingam <sramling>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: batkisso, ctrianta, jgalipea, msauton, nhosoi, tscherf, vashirov
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.15-34.el6 Doc Type: Bug Fix
Doc Text:
Cause: The multi-master replication protocol keeps a cumulative counter of the relative time offsets between servers. If the system time is adjusted by more than one day (ntp issues, vm issues), the counter will be off by more than one day. Consequence: A replication consumer will refuse to accept changes from a master that has a time offset more than 1 day. Replication will be broken from that supplier to the consumer. Fix: A new configuration attribute was added to cn=config - nsslapd-ignore-time-skew. The default is "off". If this attribute is set to "on", a replication consumer will allow replication to proceed despite excessive time skew. An error message will still be logged, warning the admin about the time skew issue. Result: When nsslapd-ignore-time-skew is set to "on", replication will proceed despite excessive time skew.
Story Points: ---
Clone Of:
: 1009679 (view as bug list) Environment:
Last Closed: 2014-10-14 07:50:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1009679, 1061410    

Description Rich Megginson 2013-09-17 18:04:55 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47516

If the CSN generator clock skew is over 1 day, replication stops.  Users need to be able to continue to replicate with the high clock skew.  There should be a configuration attr that allows replication to continue despite excessive clock skew.

This is becoming a much bigger problem now that many users are using VMs, which are notorious for having system clock/time/ntp issues.

Comment 1 Rich Megginson 2013-09-17 18:06:25 UTC
Red Hat IT is requesting a hot fix, which means this bug will need to be officially fixed and supported in rhel 6.6.

Comment 2 Rich Megginson 2013-09-18 21:29:59 UTC
external 389-ds-base-1.2.11 commit

commit 9dc7a4630cb13f1da074183208b1b34962fe8101
Author: Rich Megginson <rmeggins>
Date:   Wed Sep 18 12:32:23 2013 -0600

internal
To ssh://git.app.eng.bos.redhat.com/srv/git/389-ds-base.git
 * [new branch]      rhel-6.4-bug1009122 -> rhel-6.4-bug1009122
commit 9c657d5d72569af8c650170913328d3fc5f9b3d9
Author: Rich Megginson <rmeggins>
Date:   Wed Sep 18 12:32:23 2013 -0600
To ssh://git.app.eng.bos.redhat.com/srv/git/389-ds-base.git
 * [new tag]         389-ds-base-1.2.11.15-22.1-bug1009122 -> 389-ds-base-1.2.11.15-22.1-bug1009122

Comment 3 Rich Megginson 2013-09-18 21:59:05 UTC
added test to TET trunk
will need to cherry-pick (merge and ci) this change to the rhel7 and rhel6 branch when the fix is added to rhel7.0 and rhel6.6

r8122 | rmeggins | 2013-09-18 15:57:47 -0600 (Wed, 18 Sep 2013) | 5 lines

Bug 1009122 - replication stops with excessive clock skew
https://bugzilla.redhat.com/show_bug.cgi?id=1009122

added test bug1009122 to test the new nsslapd-ignore-time-skew attribute

Comment 4 Rich Megginson 2014-01-16 18:48:28 UTC
The previous fix makes replication ignore time skew errors, but does not ensure that the CSN generator will continue to issue CSNs that exceed its built-in time skew limit. We need to make sure that the CSN generator will never issue duplicate CSNs or regress CSNs.

Comment 5 Rich Megginson 2014-01-20 18:28:56 UTC
New builds available:
http://download.devel.redhat.com/brewroot/packages/389-ds-base/1.2.11.15/31.2.el6_5.bug1009122/

Please upgrade to these new builds ASAP

Comment 6 Rich Megginson 2014-01-22 16:20:24 UTC
testcases/DS/6.0/mmrepl/accept/accept.sh
------------------------------------------------------------------------
r8283 | rmeggins | 2014-01-22 09:16:11 -0700 (Wed, 22 Jan 2014) | 3 lines

Bug 1009122
Additional debugging

Comment 7 Rich Megginson 2014-01-24 23:45:21 UTC
Customer in case 01023323 has been given a hotfix:
http://download.devel.redhat.com/brewroot/packages/389-ds-base/1.2.11.15/31.3.el6_5.citrix/x86_64
customer reports hotfix packages are working fine

Comment 11 Viktor Ashirov 2014-07-17 14:03:27 UTC
520|0 51 10030 1 2|----------------- Starting Test bug1009122 -------------------------
520|0 51 10030 1 3|Replication breaks when there is excessive clock skew.
520|0 51 10030 1 4|first, shutdown the masters
520|0 51 10030 1 5|-----------------StopSlapd: Called -----------------
520|0 51 10030 1 14|-----------------StopSlapd: Completed-----------------
520|0 51 10030 1 15|                                                      
520|0 51 10030 1 16|stopped slapd-s1
520|0 51 10030 1 17|-----------------StopSlapd: Called -----------------
520|0 51 10030 1 71|stopped slapd-s2
520|0 51 10030 1 72|next, grab the nsState value on S1 to save for later
520|0 51 10030 1 73|change the nsState value on S1 to be bogus
520|0 51 10030 1 74|changed nsstate
520|0 51 10030 1 75|start the servers
520|0 51 10030 1 76|-----------------StartSlapd: Called -----------------
520|0 51 10030 1 81|-----------------StartSlapd: Completed-----------------
520|0 51 10030 1 82|                                                      
520|0 51 10030 1 83|stopped slapd-s1
520|0 51 10030 1 84|-----------------StartSlapd: Called -----------------
520|0 51 10030 1 89|-----------------StartSlapd: Completed-----------------
520|0 51 10030 1 90|                                                      
520|0 51 10030 1 91|stopped slapd-s2
520|0 51 10030 1 92|do a change on S1
520|0 51 10030 1 93|verify that the change does not replicate to S2
520|0 51 10030 1 94|good S2 does not contain the change
520|0 51 10030 1 95|turn nsslapd-ignore-time-skew: on
520|0 51 10030 1 96|do a change on S2
520|0 51 10030 1 97|restart the servers
520|0 51 10030 1 98|-----------------StopSlapd: Called -----------------
520|0 51 10030 1 107|-----------------StopSlapd: Completed-----------------
520|0 51 10030 1 108|                                                      
520|0 51 10030 1 109|-----------------StartSlapd: Called -----------------
520|0 51 10030 1 114|-----------------StartSlapd: Completed-----------------
520|0 51 10030 1 115|                                                      
520|0 51 10030 1 116|stopped slapd-s1
520|0 51 10030 1 117|-----------------StopSlapd: Called -----------------
520|0 51 10030 1 171|-----------------StartSlapd: Called -----------------
520|0 51 10030 1 176|-----------------StartSlapd: Completed-----------------
520|0 51 10030 1 177|                                                      
520|0 51 10030 1 178|stopped slapd-s2
520|0 51 10030 1 179|do 3 changes on S1
520|0 51 10030 1 180|wait for changes to replicate to S2
520|0 51 10030 1 181|do 3 changes on S2
520|0 51 10030 1 182|verify that the changes replicate to S2
520|0 51 10030 1 183|good S2 contains change from S1
520|0 51 10030 1 184|reset and cleanup
520|0 51 10030 1 185|-----------------StopSlapd: Called -----------------
520|0 51 10030 1 194|-----------------StopSlapd: Completed-----------------
520|0 51 10030 1 195|                                                      
520|0 51 10030 1 196|stopped slapd-s1
520|0 51 10030 1 197|changed nsstate
520|0 51 10030 1 198|start the servers
520|0 51 10030 1 199|-----------------StartSlapd: Called -----------------
520|0 51 10030 1 204|-----------------StartSlapd: Completed-----------------
520|0 51 10030 1 205|                                                      
520|0 51 10030 1 206|stopped slapd-s1
520|0 51 10030 1 207|TestCase [bug1009122] result-> [PASS]

Testcase passes, hence marking as verified.

Comment 12 errata-xmlrpc 2014-10-14 07:50:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1385.html