983771 – ntpd configuration should provide better fault resiliency

Bug 983771 - ntpd configuration should provide better fault resiliency

Summary: ntpd configuration should provide better fault resiliency

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-packstack
Sub Component:
Version:	4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	4.0
Assignee:	Ivan Chavero
QA Contact:	Martin Magr
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-11 23:31 UTC by Alessandro Pilotti
Modified:	2016-04-26 14:34 UTC (History)
CC List:	6 users (show)
Fixed In Version:	openstack-packstack-2013.2.1-0.14.dev919.el6ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-12-19 23:54:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2013:1859	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory	2013-12-21 00:01:48 UTC

Description Alessandro Pilotti 2013-07-11 23:31:54 UTC

Description of problem:

ntpd.pp fails if the servers provided by CONFIG_NTP_SERVERS are temporarily unavailable, with the result that the entire installation fails.

When packstack is executed on a multi-node environment this happens quite frequently. A simple retry cycle would be enough to get over transient errors.

Here's an example of the error obtained during packstack execution, using pool.ntp.org.

10.73.75.111_ntpd.pp :                                               [ DONE ]
                                                                                        [ ERROR ]

ERROR : Error during puppet run : err: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate pool.ntp.org returned 1 instead of one of [0] at /var/tmp/packstack/97d4fbc5af96405892ab0ace62e11095/manifests/10.73.76.64_ntpd.pp:85

Comment 1 Alessandro Pilotti 2013-07-11 23:38:29 UTC

Workaround

Specifying multiple separate pools helps in mitigating the issue, for example: 

CONFIG_NTP_SERVERS=0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org

Comment 3 Daniel Korn 2013-08-12 06:50:46 UTC

Tets comment - please ignore.

Comment 4 Alvaro Lopez Ortega 2013-11-13 18:34:09 UTC

- Replace error to warning
- Change the default value to a list of servers (so we increase the odds of someone working properly)

Comment 5 Ivan Chavero 2013-11-14 21:33:54 UTC

we can check first if the server is available querying it first with /usr/sbin/ntpdate -q ntp.exampleee.com

in puppet:

exec {'ntpdate':
    command => '/usr/sbin/ntpdate ntp.exampleee.com',
    onlyif => "/usr/sbin/ntpdate -q ntp.exampleee.com"
}


the problem with this approach is that it doesn't show any warning...

Comment 6 Alvaro Lopez Ortega 2013-11-22 21:33:21 UTC

It's important to run NTP on the servers, and so I'd say packstack should actually  fail if the NTP server cannot be reached.

The error message may be a little bit confusing though. Also, we should add an extra server or two to CONFIG_NTP_SERVERS to increase of odds of finding a working server.

Comment 7 Alvaro Lopez Ortega 2013-11-22 22:05:57 UTC

https://review.openstack.org/#/c/58035/

Comment 8 Alvaro Lopez Ortega 2013-12-03 14:53:21 UTC

Merged

Comment 10 Scott Lewis 2013-12-09 15:30:45 UTC

Adding OtherQA for bugs in MODIFIED

Comment 13 Martin Magr 2013-12-12 15:52:17 UTC

ntpdate is run 3 times before failure is reported:
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 1/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 2/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 3/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[mNotice: /Stage[main]//Exec[ntpdate]/returns: 12 Dec 16:48:21 ntpdate[23698]: no server suitable for synchronization foundESC[0m
ESC[1;31mError: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m
ESC[1;31mError: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m

Comment 15 errata-xmlrpc 2013-12-19 23:54:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html

Note You need to log in before you can comment on or make changes to this bug.