Bug 983771 - ntpd configuration should provide better fault resiliency
Summary: ntpd configuration should provide better fault resiliency
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-packstack
Version: 4.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 4.0
Assignee: Ivan Chavero
QA Contact: Martin Magr
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-11 23:31 UTC by Alessandro Pilotti
Modified: 2016-04-26 14:34 UTC (History)
6 users (show)

Fixed In Version: openstack-packstack-2013.2.1-0.14.dev919.el6ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-19 23:54:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Description Alessandro Pilotti 2013-07-11 23:31:54 UTC
Description of problem:

ntpd.pp fails if the servers provided by CONFIG_NTP_SERVERS are temporarily unavailable, with the result that the entire installation fails.

When packstack is executed on a multi-node environment this happens quite frequently. A simple retry cycle would be enough to get over transient errors.

Here's an example of the error obtained during packstack execution, using pool.ntp.org.

10.73.75.111_ntpd.pp :                                               [ DONE ]
                                                                                        [ ERROR ]

ERROR : Error during puppet run : err: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate pool.ntp.org returned 1 instead of one of [0] at /var/tmp/packstack/97d4fbc5af96405892ab0ace62e11095/manifests/10.73.76.64_ntpd.pp:85

Comment 1 Alessandro Pilotti 2013-07-11 23:38:29 UTC
Workaround

Specifying multiple separate pools helps in mitigating the issue, for example: 

CONFIG_NTP_SERVERS=0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org

Comment 3 Daniel Korn 2013-08-12 06:50:46 UTC
Tets comment - please ignore.

Comment 4 Alvaro Lopez Ortega 2013-11-13 18:34:09 UTC
- Replace error to warning
- Change the default value to a list of servers (so we increase the odds of someone working properly)

Comment 5 Ivan Chavero 2013-11-14 21:33:54 UTC
we can check first if the server is available querying it first with /usr/sbin/ntpdate -q ntp.exampleee.com

in puppet:

exec {'ntpdate':
    command => '/usr/sbin/ntpdate ntp.exampleee.com',
    onlyif => "/usr/sbin/ntpdate -q ntp.exampleee.com"
}


the problem with this approach is that it doesn't show any warning...

Comment 6 Alvaro Lopez Ortega 2013-11-22 21:33:21 UTC
It's important to run NTP on the servers, and so I'd say packstack should actually  fail if the NTP server cannot be reached.

The error message may be a little bit confusing though. Also, we should add an extra server or two to CONFIG_NTP_SERVERS to increase of odds of finding a working server.

Comment 7 Alvaro Lopez Ortega 2013-11-22 22:05:57 UTC
https://review.openstack.org/#/c/58035/

Comment 8 Alvaro Lopez Ortega 2013-12-03 14:53:21 UTC
Merged

Comment 10 Scott Lewis 2013-12-09 15:30:45 UTC
Adding OtherQA for bugs in MODIFIED

Comment 13 Martin Magr 2013-12-12 15:52:17 UTC
ntpdate is run 3 times before failure is reported:
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 1/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 2/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 3/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[mNotice: /Stage[main]//Exec[ntpdate]/returns: 12 Dec 16:48:21 ntpdate[23698]: no server suitable for synchronization foundESC[0m
ESC[1;31mError: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m
ESC[1;31mError: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m

Comment 15 errata-xmlrpc 2013-12-19 23:54:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.