Bug 1296365 - ntp is not configured persistently on overcloud
Summary: ntp is not configured persistently on overcloud
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin
Version: unspecified
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: y3
: 7.0 (Kilo)
Assignee: Ben Nemec
QA Contact: Ofer Blaut
URL:
Whiteboard:
: 1300467 (view as bug list)
Depends On:
Blocks: 1299085 1299849
TreeView+ depends on / blocked
 
Reported: 2016-01-07 00:46 UTC by Graeme Gillies
Modified: 2019-10-10 10:50 UTC (History)
17 users (show)

Fixed In Version: python-rdomanager-oscplugin-0.0.10-27.el7ost
Doc Type: Known Issue
Doc Text:
Multiple services attempted NTP configuration on the Overcloud and the last service configured it incorrectly. This caused time synchronization issues across all Overcloud nodes. As a workaround, delete /usr/libexec/os-apply-config/templates/etc/ntp.conf from all Overcloud nodes and re-run the deployment command to re-apply the puppet configuration. This is required for users updating from an older version of Red Hat OpenStack Platform to 7.3. This fix is not necessary on new 7.3 deployments. NTP now configures correctly.
Clone Of:
: 1299085 (view as bug list)
Environment:
Last Closed: 2016-02-18 16:48:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 271078 0 'None' MERGED Remove ntp element from overcloud images 2020-03-14 16:58:33 UTC
Red Hat Product Errata RHBA-2016:0264 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory 2016-02-18 21:41:29 UTC

Description Graeme Gillies 2016-01-07 00:46:24 UTC
Description of problem:
It seems that while ntp is initially configured correr

Version-Release number of selected component (if applicable):
Director 7.2, RHOS 7.0.3


How reproducible:
Always

Steps to Reproduce:
1. deploy overcloud with --ntp-server parameter pointing to a timeserver

Actual results:
While
ntpq -p
shows ntp is running and pointing to the nameserver configured, if you look in /etc/ntp.conf you will see that somehow ntp has been reconfigured with no ntp servers at all. This means as soon as you restart the node or even just restart ntp on the machine, it will no longer be syncing via ntp


Expected results:
ntp configuration is applied and persisted to disk

Additional info:

Comment 2 Mike Burns 2016-01-20 23:25:24 UTC
*** Bug 1300467 has been marked as a duplicate of this bug. ***

Comment 3 Ben Nemec 2016-01-21 20:58:11 UTC
I think the problem is we have an os-apply-config /etc/ntp.conf too: https://github.com/openstack/tripleo-image-elements/blob/master/elements/ntp/os-apply-config/etc/ntp.conf

That looks like what is getting written to ntp.conf after puppet configures it correctly.  Given that puppet is managing ntp for us, we should probably just remove that element from the images.  I haven't tested it yet though.

Comment 4 Ben Nemec 2016-01-21 21:55:40 UTC
Okay, just ran a test with the upstream bits and removing the ntp element from the image does make this work again.  I'll post the patches.

Comment 5 Ben Nemec 2016-01-21 23:12:08 UTC
I've attached the two patches that should fix this for 7 and 8.

Comment 6 Vinny Valdez 2016-01-27 20:11:06 UTC
Using openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch here's how we worked around this issue. We changed the following lines by removing the extra {} around the server entry:

Changed from:
        ntp:
          servers:
            - {server: {get_param: NtpServer}}

Changed to:
        ntp:
          servers:
            - server: {get_param: NtpServer}

In the following files:
/usr/share/openstack-tripleo-heat-templates/controller.yaml
/usr/share/openstack-tripleo-heat-templates/compute.yaml
/usr/share/openstack-tripleo-heat-templates/undercloud-source.yaml

Comment 7 Ben Nemec 2016-01-27 20:39:30 UTC
So we actually had two bugs then - not only did we have two different tools trying to configure NTP, but one of them had bad configuration.  Probably just as well or we might not ever have noticed the collision.

Another workaround that will allow puppet to configure NTP as intended is to remove the /usr/libexec/os-apply-config/templates/etc/ntp.conf file from the overcloud nodes.  This can either be done with virt-customize on the overcloud-full image or with a firstboot script.

Comment 8 Ben Nemec 2016-02-03 14:18:05 UTC
For 7.3, we just need to merge the backport of https://review.openstack.org/#/c/271048 .  It has passed upstream CI and merged there, so it should be reasonably safe downstream.

Comment 9 David Juran 2016-02-03 14:53:17 UTC
In regards to the doc-text, is it really sufficient to edit the controller-nodes? What about the other types?

Comment 10 Ben Nemec 2016-02-04 09:19:52 UTC
Good point.  The initial concern that was raised with this was the controllers getting out of sync, but I believe there could be issues with the other node types as well so I've updated the doc text to remove the controller-specific bits.

Comment 11 Felipe Alfaro Solana 2016-02-04 11:07:40 UTC
We think we've seen this in Ceph nodes in our Red Hat OpenStack 7.2 cluster.

Comment 13 Jeremy 2016-02-09 17:37:44 UTC
Hello,

Also seeing this issue with another customer 


Workaround they found is to use pssh and run: sudo puppet apply -e "include ::ntp" on all nodes to write the correct config, and restart the ntp service.

Upstream patch set: https://review.openstack.org/#/c/271048

Thanks,
Jeremy

Comment 14 Ben Nemec 2016-02-09 18:19:51 UTC
(In reply to Jeremy from comment #13)
> Hello,
> 
> Also seeing this issue with another customer 
> 
> 
> Workaround they found is to use pssh and run: sudo puppet apply -e "include
> ::ntp" on all nodes to write the correct config, and restart the ntp service.
> 
> Upstream patch set: https://review.openstack.org/#/c/271048
> 
> Thanks,
> Jeremy

I wouldn't recommend this as a standalone workaround though.  It will work once, but the next time they run an update on their overcloud the puppet config will be overwritten by the os-apply-config version again.  However, in combination with the deletion of the /usr/libexec file it would be a quicker way to re-apply the puppet ntp conf than a complete stack-update.

Comment 16 Ofer Blaut 2016-02-16 14:01:53 UTC
python-rdomanager-oscplugin-0.0.10-28.el7ost.noarch


Verified 


tested ntpstats before reboot and after /etc/ntp.conf is configured with the correct ntp ip used in the deployment

Comment 18 errata-xmlrpc 2016-02-18 16:48:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html

Comment 19 Graeme Gillies 2016-02-24 02:45:26 UTC
Hi,

We have hit this issue again in all of our environments which have been upgraded to director 7.3 and RHOS 7.0.4. NTP is not configured persistently on the overcloud, causing clocks to skew when rebooted.

This is likely because the file

/usr/libexec/os-apply-config/templates/etc/ntp.conf

still exists. We need to add a documentation note to clean it out, or preferably, director should be correctly cleaning out os-apply-config correctly so we fix this class of problems (of which this is not the first) once and for all

Comment 20 Mike Burns 2016-02-24 12:25:34 UTC
This cleanup is being fixed in bug 1294098

Comment 21 Ben Nemec 2016-02-24 22:00:42 UTC
That bug won't help with this.  1294098 is about the undercloud and this is a bug on the overcloud.

Unfortunately, due to the way the os-*-config scripts are installed on the overcloud images we can't just yum update them during upgrades.  This limitation is being worked on for future releases, but for 7.x-7.3 upgrades the only solution is manual removal of the file.

The doctext on this bug correctly describes the situation, but it doesn't appear anywhere in the errata.  I'm not sure how that information is supposed to be communicated to users.

Comment 22 Graeme Gillies 2016-02-24 22:03:20 UTC
I think maybe a heads up for removing this file (or maybe all of the os-*-config scripts?) should go in section "9.3.4. Version Specific Notes"

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Version_Specific_Notes


Note You need to log in before you can comment on or make changes to this bug.