Description of problem: It seems that while ntp is initially configured correr Version-Release number of selected component (if applicable): Director 7.2, RHOS 7.0.3 How reproducible: Always Steps to Reproduce: 1. deploy overcloud with --ntp-server parameter pointing to a timeserver Actual results: While ntpq -p shows ntp is running and pointing to the nameserver configured, if you look in /etc/ntp.conf you will see that somehow ntp has been reconfigured with no ntp servers at all. This means as soon as you restart the node or even just restart ntp on the machine, it will no longer be syncing via ntp Expected results: ntp configuration is applied and persisted to disk Additional info:
*** Bug 1300467 has been marked as a duplicate of this bug. ***
I think the problem is we have an os-apply-config /etc/ntp.conf too: https://github.com/openstack/tripleo-image-elements/blob/master/elements/ntp/os-apply-config/etc/ntp.conf That looks like what is getting written to ntp.conf after puppet configures it correctly. Given that puppet is managing ntp for us, we should probably just remove that element from the images. I haven't tested it yet though.
Okay, just ran a test with the upstream bits and removing the ntp element from the image does make this work again. I'll post the patches.
I've attached the two patches that should fix this for 7 and 8.
Using openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch here's how we worked around this issue. We changed the following lines by removing the extra {} around the server entry: Changed from: ntp: servers: - {server: {get_param: NtpServer}} Changed to: ntp: servers: - server: {get_param: NtpServer} In the following files: /usr/share/openstack-tripleo-heat-templates/controller.yaml /usr/share/openstack-tripleo-heat-templates/compute.yaml /usr/share/openstack-tripleo-heat-templates/undercloud-source.yaml
So we actually had two bugs then - not only did we have two different tools trying to configure NTP, but one of them had bad configuration. Probably just as well or we might not ever have noticed the collision. Another workaround that will allow puppet to configure NTP as intended is to remove the /usr/libexec/os-apply-config/templates/etc/ntp.conf file from the overcloud nodes. This can either be done with virt-customize on the overcloud-full image or with a firstboot script.
For 7.3, we just need to merge the backport of https://review.openstack.org/#/c/271048 . It has passed upstream CI and merged there, so it should be reasonably safe downstream.
In regards to the doc-text, is it really sufficient to edit the controller-nodes? What about the other types?
Good point. The initial concern that was raised with this was the controllers getting out of sync, but I believe there could be issues with the other node types as well so I've updated the doc text to remove the controller-specific bits.
We think we've seen this in Ceph nodes in our Red Hat OpenStack 7.2 cluster.
Hello, Also seeing this issue with another customer Workaround they found is to use pssh and run: sudo puppet apply -e "include ::ntp" on all nodes to write the correct config, and restart the ntp service. Upstream patch set: https://review.openstack.org/#/c/271048 Thanks, Jeremy
(In reply to Jeremy from comment #13) > Hello, > > Also seeing this issue with another customer > > > Workaround they found is to use pssh and run: sudo puppet apply -e "include > ::ntp" on all nodes to write the correct config, and restart the ntp service. > > Upstream patch set: https://review.openstack.org/#/c/271048 > > Thanks, > Jeremy I wouldn't recommend this as a standalone workaround though. It will work once, but the next time they run an update on their overcloud the puppet config will be overwritten by the os-apply-config version again. However, in combination with the deletion of the /usr/libexec file it would be a quicker way to re-apply the puppet ntp conf than a complete stack-update.
python-rdomanager-oscplugin-0.0.10-28.el7ost.noarch Verified tested ntpstats before reboot and after /etc/ntp.conf is configured with the correct ntp ip used in the deployment
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0264.html
Hi, We have hit this issue again in all of our environments which have been upgraded to director 7.3 and RHOS 7.0.4. NTP is not configured persistently on the overcloud, causing clocks to skew when rebooted. This is likely because the file /usr/libexec/os-apply-config/templates/etc/ntp.conf still exists. We need to add a documentation note to clean it out, or preferably, director should be correctly cleaning out os-apply-config correctly so we fix this class of problems (of which this is not the first) once and for all
This cleanup is being fixed in bug 1294098
That bug won't help with this. 1294098 is about the undercloud and this is a bug on the overcloud. Unfortunately, due to the way the os-*-config scripts are installed on the overcloud images we can't just yum update them during upgrades. This limitation is being worked on for future releases, but for 7.x-7.3 upgrades the only solution is manual removal of the file. The doctext on this bug correctly describes the situation, but it doesn't appear anywhere in the errata. I'm not sure how that information is supposed to be communicated to users.
I think maybe a heads up for removing this file (or maybe all of the os-*-config scripts?) should go in section "9.3.4. Version Specific Notes" https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Version_Specific_Notes