Bug 983771

Summary:	ntpd configuration should provide better fault resiliency
Product:	Red Hat OpenStack	Reporter:	Alessandro Pilotti <apilotti>
Component:	openstack-packstack	Assignee:	Ivan Chavero <ichavero>
Status:	CLOSED ERRATA	QA Contact:	Martin Magr <mmagr>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.0	CC:	aortega, breeler, derekh, dkorn, ichavero, mmagr
Target Milestone:	rc	Keywords:	OtherQA, Triaged
Target Release:	4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-packstack-2013.2.1-0.14.dev919.el6ost	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-12-19 23:54:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alessandro Pilotti 2013-07-11 23:31:54 UTC

Description of problem:

ntpd.pp fails if the servers provided by CONFIG_NTP_SERVERS are temporarily unavailable, with the result that the entire installation fails.

When packstack is executed on a multi-node environment this happens quite frequently. A simple retry cycle would be enough to get over transient errors.

Here's an example of the error obtained during packstack execution, using pool.ntp.org.

10.73.75.111_ntpd.pp :                                               [ DONE ]
                                                                                        [ ERROR ]

ERROR : Error during puppet run : err: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate pool.ntp.org returned 1 instead of one of [0] at /var/tmp/packstack/97d4fbc5af96405892ab0ace62e11095/manifests/10.73.76.64_ntpd.pp:85

Comment 1 Alessandro Pilotti 2013-07-11 23:38:29 UTC

Workaround

Specifying multiple separate pools helps in mitigating the issue, for example: 

CONFIG_NTP_SERVERS=0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org

Comment 3 Daniel Korn 2013-08-12 06:50:46 UTC

Tets comment - please ignore.

Comment 4 Alvaro Lopez Ortega 2013-11-13 18:34:09 UTC

- Replace error to warning
- Change the default value to a list of servers (so we increase the odds of someone working properly)

Comment 5 Ivan Chavero 2013-11-14 21:33:54 UTC

we can check first if the server is available querying it first with /usr/sbin/ntpdate -q ntp.exampleee.com

in puppet:

exec {'ntpdate':
    command => '/usr/sbin/ntpdate ntp.exampleee.com',
    onlyif => "/usr/sbin/ntpdate -q ntp.exampleee.com"
}


the problem with this approach is that it doesn't show any warning...

Comment 6 Alvaro Lopez Ortega 2013-11-22 21:33:21 UTC

It's important to run NTP on the servers, and so I'd say packstack should actually  fail if the NTP server cannot be reached.

The error message may be a little bit confusing though. Also, we should add an extra server or two to CONFIG_NTP_SERVERS to increase of odds of finding a working server.

Comment 7 Alvaro Lopez Ortega 2013-11-22 22:05:57 UTC

https://review.openstack.org/#/c/58035/

Comment 8 Alvaro Lopez Ortega 2013-12-03 14:53:21 UTC

Merged

Comment 10 Scott Lewis 2013-12-09 15:30:45 UTC

Adding OtherQA for bugs in MODIFIED

Comment 13 Martin Magr 2013-12-12 15:52:17 UTC

ntpdate is run 3 times before failure is reported:
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 1/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 2/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 3/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[mNotice: /Stage[main]//Exec[ntpdate]/returns: 12 Dec 16:48:21 ntpdate[23698]: no server suitable for synchronization foundESC[0m
ESC[1;31mError: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m
ESC[1;31mError: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m

Comment 15 errata-xmlrpc 2013-12-19 23:54:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html