Bug 983771

Summary: ntpd configuration should provide better fault resiliency
Product: Red Hat OpenStack Reporter: Alessandro Pilotti <apilotti>
Component: openstack-packstackAssignee: Ivan Chavero <ichavero>
Status: CLOSED ERRATA QA Contact: Martin Magr <mmagr>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.0CC: aortega, breeler, derekh, dkorn, ichavero, mmagr
Target Milestone: rcKeywords: OtherQA, Triaged
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-packstack-2013.2.1-0.14.dev919.el6ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-19 23:54:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alessandro Pilotti 2013-07-11 23:31:54 UTC
Description of problem:

ntpd.pp fails if the servers provided by CONFIG_NTP_SERVERS are temporarily unavailable, with the result that the entire installation fails.

When packstack is executed on a multi-node environment this happens quite frequently. A simple retry cycle would be enough to get over transient errors.

Here's an example of the error obtained during packstack execution, using pool.ntp.org.

10.73.75.111_ntpd.pp :                                               [ DONE ]
                                                                                        [ ERROR ]

ERROR : Error during puppet run : err: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate pool.ntp.org returned 1 instead of one of [0] at /var/tmp/packstack/97d4fbc5af96405892ab0ace62e11095/manifests/10.73.76.64_ntpd.pp:85

Comment 1 Alessandro Pilotti 2013-07-11 23:38:29 UTC
Workaround

Specifying multiple separate pools helps in mitigating the issue, for example: 

CONFIG_NTP_SERVERS=0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org

Comment 3 Daniel Korn 2013-08-12 06:50:46 UTC
Tets comment - please ignore.

Comment 4 Alvaro Lopez Ortega 2013-11-13 18:34:09 UTC
- Replace error to warning
- Change the default value to a list of servers (so we increase the odds of someone working properly)

Comment 5 Ivan Chavero 2013-11-14 21:33:54 UTC
we can check first if the server is available querying it first with /usr/sbin/ntpdate -q ntp.exampleee.com

in puppet:

exec {'ntpdate':
    command => '/usr/sbin/ntpdate ntp.exampleee.com',
    onlyif => "/usr/sbin/ntpdate -q ntp.exampleee.com"
}


the problem with this approach is that it doesn't show any warning...

Comment 6 Alvaro Lopez Ortega 2013-11-22 21:33:21 UTC
It's important to run NTP on the servers, and so I'd say packstack should actually  fail if the NTP server cannot be reached.

The error message may be a little bit confusing though. Also, we should add an extra server or two to CONFIG_NTP_SERVERS to increase of odds of finding a working server.

Comment 7 Alvaro Lopez Ortega 2013-11-22 22:05:57 UTC
https://review.openstack.org/#/c/58035/

Comment 8 Alvaro Lopez Ortega 2013-12-03 14:53:21 UTC
Merged

Comment 10 Scott Lewis 2013-12-09 15:30:45 UTC
Adding OtherQA for bugs in MODIFIED

Comment 13 Martin Magr 2013-12-12 15:52:17 UTC
ntpdate is run 3 times before failure is reported:
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 1/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 2/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: /Stage[main]//Exec[ntpdate]/returns: Exec try 3/3ESC[0m
ESC[0;36mDebug: Exec[ntpdate](provider=posix): Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[0;36mDebug: Executing '/usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'ESC[0m
ESC[mNotice: /Stage[main]//Exec[ntpdate]/returns: 12 Dec 16:48:21 ntpdate[23698]: no server suitable for synchronization foundESC[0m
ESC[1;31mError: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m
ESC[1;31mError: /Stage[main]//Exec[ntpdate]/returns: change from notrun to 0 failed: /usr/sbin/ntpdate 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org returned 1 instead of one of [0]ESC[0m

Comment 15 errata-xmlrpc 2013-12-19 23:54:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html