Bug 1370229

Summary: openshift_clock_enabled should default to True
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED NOTABUG QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: high    
Version: 3.3.0CC: aos-bugs, bleanhar, jokerman, mifiedle, mmccomas, tstclair
Target Milestone: ---   
Target Release: 3.3.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-04 01:07:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Inventory to reproduce this issue none

Description Mike Fiedler 2016-08-25 16:17:45 UTC
Description of problem:

Any non-trivial cluster should have time synchronization turned on for failover and other cluster operations to work correctly.   It also helps with debugging via system logs, etc.  The OCP installer should default the openshift_clock_enabled variable to True.


Version-Release number of selected component (if applicable): 3.3.0.23

Comment 1 Brenton Leanhardt 2016-08-25 19:10:40 UTC
I think the intention of the original PR was to have this enabled by default:

https://github.com/openshift/openshift-ansible/pull/1672#issuecomment-205493798

I think it's just a bug.

Comment 4 Scott Dodson 2016-09-28 17:28:25 UTC
It's actually enabled everywhere by default, should we add a force sync early in the process?

Comment 5 Timothy St. Clair 2016-09-28 22:34:59 UTC
On our 3.3 deployment ntp was not properly installed and enabled, which cause clock skew, which in turn triggered a false fail-over condition.

Comment 6 Scott Dodson 2016-09-30 17:17:58 UTC
We'd need to have more info on that deployment to debug the issue, it's enabled by default and has been in all tags since early June. Need a reproducer.

Comment 7 Mike Fiedler 2016-10-03 14:49:39 UTC
Created attachment 1206909 [details]
Inventory to reproduce this issue

Comment 8 Mike Fiedler 2016-10-03 14:50:07 UTC
To reproduce (I used 3.4.0.5)

1. yum uninstall ntp
2. Run the ansible installer on an inventory with no openshift_clock variable (sample attached from my install)
3. After the install   yum list installed | grep ntp

There will be no ntp installed.

Comment 9 Mike Fiedler 2016-10-03 14:50:31 UTC
Sorry, "yum remove ntp"

Comment 10 Scott Dodson 2016-10-03 15:32:21 UTC
Presumably only happens when chrony rpm is pre-installed. From an affected host:

root@dhcp5-210: ~ # rpm -q chrony
chrony-2.1.1-3.el7.x86_64

root@dhcp5-210: ~ # timedatectl 
      Local time: Mon 2016-10-03 11:30:14 EDT
  Universal time: Mon 2016-10-03 15:30:14 UTC
        RTC time: Mon 2016-10-03 15:30:14
       Time zone: America/New_York (EDT, -0400)
     NTP enabled: yes
NTP synchronized: no
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2016-03-13 01:59:59 EST
                  Sun 2016-03-13 03:00:00 EDT
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2016-11-06 01:59:59 EDT
                  Sun 2016-11-06 01:00:00 EST

Comment 11 Mike Fiedler 2016-10-03 16:44:55 UTC
I will retest with no pre-installed chrony or ntp.   Unfortunately, the original cluster this was reported on is long gone, so we can't do any forensic analysis there.

Comment 12 Scott Dodson 2016-10-03 18:00:13 UTC
Comment #10 is for a different environment and in that case the host had a firewall blocking access to the configured nameservers.

Comment 13 Scott Dodson 2016-10-03 18:02:05 UTC
Timeservers, not nameservers. *sigh*

Comment 14 Mike Fiedler 2016-10-04 01:07:54 UTC
This is working as described in comment 10.  If neither ntp or chrony are installed, ntp gets installed and configured.   Will watch for this on the next horizontal cluster install - as I said we no longer have access to CNCF to try to reconstruct what happened there.