Bug 983771 - ntpd configuration should provide better fault resiliency
ntpd configuration should provide better fault resiliency
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-packstack (Show other bugs)
4.0
Unspecified Unspecified
unspecified Severity medium
: rc
: 4.0
Assigned To: Ivan Chavero
Martin Magr
: OtherQA, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-11 19:31 EDT by Alessandro Pilotti
Modified: 2016-04-26 10:34 EDT (History)
6 users (show)

See Also:
Fixed In Version: openstack-packstack-2013.2.1-0.14.dev919.el6ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-19 18:54:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alessandro Pilotti 2013-07-11 19:31:54 EDT
Description of problem:

ntpd.pp fails if the servers provided by CONFIG_NTP_SERVERS are temporarily unavailable, with the result that the entire installation fails.

When packstack is executed on a multi-node environment this happens quite frequently. A simple retry cycle would be enough to get over transient errors.

Here's an example of the error obtained during packstack execution, using pool.ntp.org.

10.73.75.111_ntpd.pp :                                               [ DONE ]
                                                                                        [ ERROR ]

ERROR : Error during puppet run : err: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate pool.ntp.org returned 1 instead of one of [0] at /var/tmp/packstack/97d4fbc5af96405892ab0ace62e11095/manifests/10.73.76.64_ntpd.pp:85
Comment 1 Alessandro Pilotti 2013-07-11 19:38:29 EDT
Workaround

Specifying multiple separate pools helps in mitigating the issue, for example: 

CONFIG_NTP_SERVERS=0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org
Comment 3 Daniel Korn 2013-08-12 02:50:46 EDT
Tets comment - please ignore.
Comment 4 Alvaro Lopez Ortega 2013-11-13 13:34:09 EST
- Replace error to warning
- Change the default value to a list of servers (so we increase the odds of someone working properly)
Comment 5 Ivan Chavero 2013-11-14 16:33:54 EST
we can check first if the server is available querying it first with /usr/sbin/ntpdate -q ntp.exampleee.com

in puppet:

exec {'ntpdate':
    command => '/usr/sbin/ntpdate ntp.exampleee.com',
    onlyif => "/usr/sbin/ntpdate -q ntp.exampleee.com"
}


the problem with this approach is that it doesn't show any warning...
Comment 6 Alvaro Lopez Ortega 2013-11-22 16:33:21 EST
It's important to run NTP on the servers, and so I'd say packstack should actually  fail if the NTP server cannot be reached.

The error message may be a little bit confusing though. Also, we should add an extra server or two to CONFIG_NTP_SERVERS to increase of odds of finding a working server.
Comment 7 Alvaro Lopez Ortega 2013-11-22 17:05:57 EST
https://review.openstack.org/#/c/58035/
Comment 8 Alvaro Lopez Ortega 2013-12-03 09:53:21 EST
Merged
Comment 10 Scott Lewis 2013-12-09 10:30:45 EST
Adding OtherQA for bugs in MODIFIED
Comment 13 Martin Magr 2013-12-12 10:52:17 EST
ntpdate is run 3 times before failure is reported:
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 1/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 2/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 3/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[mNotice: /Stage[main]//Exec[ntpdate]/returns: 12 Dec 16:48:21 ntpdate[23698]: no server suitable for synchronization foundESC[0m
ESC[1;31mError: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m
ESC[1;31mError: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m
Comment 15 errata-xmlrpc 2013-12-19 18:54:23 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html

Note You need to log in before you can comment on or make changes to this bug.