Bug 1698135 - Overcloud deployment ansible failed on TASK [Ensure system is NTP time synced]
Summary: Overcloud deployment ansible failed on TASK [Ensure system is NTP time synced]
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Adriano Petrich
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-09 16:30 UTC by Yuri Obshansky
Modified: 2019-10-03 20:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-03 20:39:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ansible.log (4.09 MB, text/plain)
2019-04-09 16:35 UTC, Yuri Obshansky
no flags Details

Description Yuri Obshansky 2019-04-09 16:30:54 UTC
Description of problem:
Overcloud deployment ansible playbook failed on 
TASK [Ensure system is NTP time synced]
TASK [Ensure system is NTP time synced] ****************************************
Tuesday 09 April 2019  14:52:24 +0000 (0:00:00.620)       0:03:28.824 ********* 
skipping: [overcloud-compute2-0] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [overcloud-compute1-0] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [overcloud-compute0-0] => {"changed": false, "skip_reason": "Conditional result was False"}
fatal: [overcloud-controller0-2]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.061951", "end": "2019-04-09 10:55:34.273360", "msg": "non-zero return code", "rc": 1, "start": "2019-04-09 10:52:24.211409", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]}

fatal: [overcloud-controller0-0]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.195753", "end": "2019-04-09 10:55:33.828381", "msg": "non-zero return code", "rc": 1, "start": "2019-04-09 10:52:23.632628", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]}
fatal: [overcloud-controller0-1]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.191272", "end": "2019-04-09 10:55:34.021873", "msg": "non-zero return code", "rc": 1, "start": "2019-04-09 10:52:23.830601", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
overcloud-compute0-0       : ok=60   changed=20   unreachable=0    failed=0   
overcloud-compute1-0       : ok=60   changed=20   unreachable=0    failed=0   
overcloud-compute2-0       : ok=60   changed=20   unreachable=0    failed=0   
overcloud-controller0-0    : ok=137  changed=83   unreachable=0    failed=1   
overcloud-controller0-1    : ok=137  changed=83   unreachable=0    failed=1   
overcloud-controller0-2    : ok=137  changed=83   unreachable=0    failed=1   
undercloud                 : ok=3    changed=0    unreachable=0    failed=0   

Tuesday 09 April 2019  14:55:34 +0000 (0:03:10.482)       0:06:39.307 ********* 
=============================================================================== 

Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action
    verbosity=self.app_args.verbose_level)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 327, in config_download
    raise exceptions.DeploymentError("Overcloud configuration failed.")
tripleoclient.exceptions.DeploymentError: Overcloud configuration failed.
Overcloud configuration failed.

Version-Release number of selected component (if applicable):
openstack-tripleo-common-containers-10.6.1-0.20190404000356.3398bec.el8ost.noarch
python3-tripleoclient-heat-installer-11.3.1-0.20190403170353.73cc438.el8ost.noarch
openstack-tripleo-image-elements-10.3.1-0.20190325204940.253fe88.el8ost.noarch
ansible-tripleo-ipsec-9.0.1-0.20190220162047.f60ad6c.el8ost.noarch
ansible-role-tripleo-modify-image-1.0.1-0.20190402220346.012209a.el8ost.noarch
python3-tripleoclient-11.3.1-0.20190403170353.73cc438.el8ost.noarch
python3-tripleo-common-10.6.1-0.20190404000356.3398bec.el8ost.noarch
openstack-tripleo-validations-10.3.1-0.20190403171315.a4c40f2.el8ost.noarch
openstack-tripleo-common-10.6.1-0.20190404000356.3398bec.el8ost.noarch
openstack-tripleo-puppet-elements-10.2.1-0.20190327211339.0f6cacb.el8ost.noarch
openstack-tripleo-heat-templates-10.4.1-0.20190403221322.0d98720.el8ost.noarch
puppet-tripleo-10.3.1-0.20190403180925.81d7714.el8ost.noarch

How reproducible:


Steps to Reproduce:
1. Start Overcloud deployment
(undercloud) [stack@site-undercloud-0 ~]$ cat overcloud_deploy.sh 
#!/bin/bash
source /home/stack/stackrc

export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates $THT/ \
--timeout 100 \
-e $THT/environments/podman.yaml \
-e $THT/environments/disable-telemetry.yaml \
-e $THT/environments/docker-ha.yaml \
-e $THT/environments/services/neutron-ovn-ha.yaml \
-e $THT/environments/network-isolation.yaml \
-e containers-prepare-parameters.yaml \
-e params.yaml \
-n /home/stack/virt/network/network_data_spine_leaf.yaml \
-r /home/stack/virt/roles/roles_data_spine_leaf.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/network/network-environment-overrides.yaml \
-e /home/stack/virt/nodes_data.yaml \
--log-file overcloud_deployment_90.log
where 
(undercloud) [stack@site-undercloud-0 ~]$ cat params.yaml 
resource_registry:
  OS::TripleO::Services::Docker: OS::Heat::None
parameter_defaults:
  DockerInsecureRegistryAddress:
    - had-04.ha.lab.eng.bos.redhat.com:5000
    - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
    - 192.168.24.2:8787
  SELinuxMode: permissive
  PythonInterpreter: /usr/bin/python3
  NovaComputeLibvirtType: qemu
  DnsServers: ['10.11.5.19', '10.5.30.160']

  ControllerCount: 3
  ComputeCount: 3
  NtpServer: ["clock.redhat.com","clock2.redhat.com"]

2.
3.

Actual results:
Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action
    verbosity=self.app_args.verbose_level)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 327, in config_download
    raise exceptions.DeploymentError("Overcloud configuration failed.")
tripleoclient.exceptions.DeploymentError: Overcloud configuration failed.
Overcloud configuration failed.


Expected results:
No failures

Additional info:

Comment 1 Yuri Obshansky 2019-04-09 16:35:12 UTC
Created attachment 1553898 [details]
ansible.log

Comment 2 Alex Schultz 2019-04-09 18:57:25 UTC
This looks like the systems cannot reach the configured NTP servers. please ensure connectivity to clock.redhat.com and clock2.redhat.com from the controllers. Also please provide the /etc/chrony.conf configuration file.  If you have a reproducer that would also be beneficial.

Comment 3 Yuri Obshansky 2019-04-09 20:01:15 UTC
The issue reproduced several time today with different argument for deployment command such as 
(--ntp-server 10.5.27.10 and --ntp-server clock.redhat.com)

The undercloud status:

(undercloud) [stack@site-undercloud-0 ~]$ ping clock.redhat.com
PING clock.corp.redhat.com (10.11.160.238) 56(84) bytes of data.
64 bytes from clock1.rdu2.redhat.com (10.11.160.238): icmp_seq=1 ttl=56 time=156 ms

(undercloud) [stack@site-undercloud-0 ~]$ ping clock2.redhat.com
PING clock.corp.redhat.com (10.16.255.1) 56(84) bytes of data.
64 bytes from clock.bos.redhat.com (10.16.255.1): icmp_seq=1 ttl=54 time=176 ms

(undercloud) [stack@site-undercloud-0 ~]$ cat /etc/chrony.conf
# Do not manually edit this file.
# Managed by ansible-role-chrony
server clock.redhat.com iburst minpoll 6 maxpoll 10
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
deny all
driftfile /var/lib/chrony/drift
logdir /var/log/chrony
rtcsync
makestep 1.0 3

The controller status:

[root@overcloud-controller0-0 ~]# ping clock.redhat.com
ping: clock.redhat.com: Name or service not known

[root@overcloud-controller0-0 ~]# ping clock2.redhat.com
ping: clock2.redhat.com: Name or service not known

[root@overcloud-controller0-0 ~]# ping 10.5.27.10
PING 10.5.27.10 (10.5.27.10) 56(84) bytes of data.
64 bytes from 10.5.27.10: icmp_seq=1 ttl=54 time=184 ms

[root@overcloud-controller0-0 ~]# cat /etc/chrony.conf 
# Do not manually edit this file.
# Managed by ansible-role-chrony
server clock.redhat.com iburst minpoll 6 maxpoll 10
server clock2.redhat.com iburst minpoll 6 maxpoll 10
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
deny all
driftfile /var/lib/chrony/drift
logdir /var/log/chrony
rtcsync
makestep 1.0 3

If you need running environment ping me on rhos-qe,rhos-dev,edge - yobshans

Thank you

Comment 4 Alex Schultz 2019-04-09 20:16:24 UTC
Since the overcloud cannot resolve clock.redhat.com it's failing. Check your dns server configuration. Did you specify a dns server for the ctlplane-subnet on the undercloud?

Comment 5 Alex Schultz 2019-04-09 20:26:54 UTC
Issue was that the nameservers were not properly configured for the overcloud nodes. In order to correct this, the undercloud should have been installed with the undercloud_nameservers specified in the undercloud.conf.  If the environment is already installed, you can run 'openstack subnet set ctlplane-subnet --dns-nameserver <nameserver ip>' and manually fix the resolv.conf on the hosts and rerun the deployment process.

Comment 6 Dan Sneddon 2019-04-09 22:31:49 UTC
To add some comments here, we discovered that none of the ifcfg files for the network interfaces (bridge or ethernet) had any DNS servers. These should be populated automatically if dns_nameserver is set on the subnet, but our documentation only mentions adding the nameservers to one subnet, the leaf0 ctlplane subnet.

Yuri is going to add nameservers to the other leaf subnets, but if that doesn't work then we probably have a bug in the auto-population of that parameter. Here is the code from puppet/role.role.j2.yaml in openstack-tripleo-heat-templates:



conditions:
  dnsservers_set:
    not:
      equals: [{get_param: DnsServers}, []]

[...]

resources:
  NetworkConfig:
    type: OS::TripleO::{{role.name}}::Net::SoftwareConfig
    properties:
      DnsServers:
        if:
          - dnsservers_set
          - {get_param: DnsServers}
          - {get_attr: [{{server_resource_name}}, addresses, ctlplane, 0, subnets, 0, dns_nameservers]}

Comment 7 Alex Schultz 2019-04-09 22:46:26 UTC
The dns servers should be automatically populated on undercloud install with the value from undercloud_nameservers. I think this also gets populated for all leafs in the code as well, https://github.com/openstack/tripleo-heat-templates/blob/master/extraconfig/post_deploy/undercloud_ctlplane_network.py

That being said if leafs are being added outside of the undercloud install process, then manual nameserver configuration on the neutron subnets may be required.

Comment 8 Yuri Obshansky 2019-04-09 22:48:09 UTC
Manually configuration doesn't work
(undercloud) [stack@site-undercloud-0 ~]$ openstack subnet show leaf1 |grep dns
| dns_nameservers   | 10.0.10.1                                                                                                                                                                         |
(undercloud) [stack@site-undercloud-0 ~]$ openstack subnet show leaf0 |grep dns
| dns_nameservers   | 10.0.10.1                                                                                                                                                                         |
(undercloud) [stack@site-undercloud-0 ~]$ openstack subnet show leaf2 |grep dns
| dns_nameservers   | 10.0.10.1                                                                                                                                                                         |

Deployment still failed

Comment 9 Yuri Obshansky 2019-04-09 22:49:36 UTC
I'll retest deployment process with undercloud_nameservers specified in the undercloud.conf
and update the bug

Comment 10 Alex Schultz 2019-04-09 22:50:40 UTC
Is 10.0.10.1 a valid nameserver?  Also you need to manually fix any systems already provided. You won't get the dns post-provisioning from the subnets

Comment 11 Yuri Obshansky 2019-04-09 23:04:16 UTC
(In reply to Alex Schultz from comment #10)
> Is 10.0.10.1 a valid nameserver?  
Yes, 
[root@overcloud-controller0-0 network-scripts]# ping 10.0.10.1
PING 10.0.10.1 (10.0.10.1) 56(84) bytes of data.
64 bytes from 10.0.10.1: icmp_seq=1 ttl=64 time=0.389 ms
64 bytes from 10.0.10.1: icmp_seq=2 ttl=64 time=0.165 ms

This is virt environmnet

[root@overcloud-controller0-0 network-scripts]# ip route
default via 10.0.10.1 dev br-ex 
10.0.10.0/24 dev br-ex proto kernel scope link src 10.0.10.110 
169.254.0.0/16 dev eth0 scope link metric 1002 
169.254.0.0/16 dev eth1 scope link metric 1003 
169.254.0.0/16 dev eth2 scope link metric 1004 
169.254.0.0/16 dev br-isolated scope link metric 1006 
169.254.0.0/16 dev br-ex scope link metric 1007 
169.254.0.0/16 dev vlan1188 scope link metric 1008 
169.254.0.0/16 dev vlan1185 scope link metric 1009 
169.254.0.0/16 dev vlan1189 scope link metric 1010 
169.254.0.0/16 dev vlan1183 scope link metric 1011 
169.254.169.254 via 192.168.24.3 dev eth0 
172.18.1.0/24 dev vlan1188 proto kernel scope link src 172.18.1.212 
172.18.2.0/24 via 172.18.1.254 dev vlan1188 
172.18.3.0/24 via 172.18.1.254 dev vlan1188 
172.19.1.0/24 dev vlan1189 proto kernel scope link src 172.19.1.174 
172.19.2.0/24 via 172.19.1.254 dev vlan1189 
172.19.3.0/24 via 172.19.1.254 dev vlan1189 
172.23.1.0/24 dev vlan1183 proto kernel scope link src 172.23.1.237 
172.23.2.0/24 via 172.23.1.254 dev vlan1183 
172.23.3.0/24 via 172.23.1.254 dev vlan1183 
172.25.1.0/24 dev vlan1185 proto kernel scope link src 172.25.1.117 
172.25.2.0/24 via 172.25.1.254 dev vlan1185 
172.25.3.0/24 via 172.25.1.254 dev vlan1185 
192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.12 
192.168.34.0/24 via 192.168.24.254 dev eth0 
192.168.44.0/24 via 192.168.24.254 dev eth0 


> Also you need to manually fix any systems
> already provided. You won't get the dns post-provisioning from the subnets

I ran deployment with provisioning nodes also with the same result.

Comment 12 Alex Schultz 2019-04-09 23:09:31 UTC
Just because it's pingable doesn't mean it's a valid nameserver. You'd have to query dns with it to verify this.  The issue isn't the ntp sync but rather the supplied environment network configurations.

Comment 14 Harald Jensås 2019-04-11 13:06:33 UTC
The reporter was able to get passed this issue with configuration changes.

Comment 15 Carlos Goncalves 2019-06-24 06:08:58 UTC
I'm seeing the same on multiple deployment tries, on different machines, and on either the undercloud or later at overcloud deployment and varying between 1 to 3 controllers failing at time sync.

NTP server clock.redhat.com is set in both undercloud and overcloud.

[stack@undercloud-0 ~]$ grep clock.redhat.com overcloud_deploy.sh 
--ntp-server clock.redhat.com \
[stack@undercloud-0 ~]$ grep clock.redhat.com virt/config_heat.yaml -B 1
parameter_defaults:
    NtpServer: clock.redhat.com
[stack@undercloud-0 ~]$ grep -r clock.redhat.com tripleo-config-generated-env-files/ -B 1
tripleo-config-generated-env-files/undercloud_parameters.yaml-  NtpServer:
tripleo-config-generated-env-files/undercloud_parameters.yaml:  - clock.redhat.co


2019-06-23 20:15:46,641 p=543 u=mistral |  TASK [Ensure system is NTP time synced] ****************************************
2019-06-23 20:15:46,641 p=543 u=mistral |  Sunday 23 June 2019  20:15:46 -0400 (0:00:00.832)       0:03:42.915 *********** 
2019-06-23 20:15:46,910 p=543 u=mistral |  skipping: [compute-0] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-06-23 20:15:46,956 p=543 u=mistral |  skipping: [compute-1] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-06-23 20:15:46,969 p=543 u=mistral |  skipping: [compute-2] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-06-23 20:15:56,919 p=543 u=mistral |  changed: [controller-0] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.015852", "end": "2019-06-23 20:15:56.897128", "rc": 0, "start": "2019-06-23 20:15:46.881276", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 0A0BA0EE, correction: 0.000004039, skew: 214.940", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 0A0BA0EE, correction: 0.000004039, skew: 214.940"]}
2019-06-23 20:15:57,080 p=543 u=mistral |  changed: [controller-2] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.014671", "end": "2019-06-23 20:15:57.056399", "rc": 0, "start": "2019-06-23 20:15:47.041728", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 0A051B0A, correction: 0.000000882, skew: 18.576", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 0A051B0A, correction: 0.000000882, skew: 18.576"]}
2019-06-23 20:18:57,176 p=543 u=mistral |  fatal: [controller-1]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.198191", "end": "2019-06-23 20:18:57.153933", "msg": "non-zero return code", "rc": 1, "start": "2019-06-23 20:15:46.955742", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]}
2019-06-23 20:18:57,177 p=543 u=mistral |  NO MORE HOSTS LEFT *************************************************************
2019-06-23 20:18:57,179 p=543 u=mistral |  PLAY RECAP *********************************************************************
2019-06-23 20:18:57,179 p=543 u=mistral |  compute-0                  : ok=74   changed=28   unreachable=0    failed=0    skipped=256  rescued=0    ignored=0   
2019-06-23 20:18:57,180 p=543 u=mistral |  compute-1                  : ok=74   changed=28   unreachable=0    failed=0    skipped=256  rescued=0    ignored=0   
2019-06-23 20:18:57,180 p=543 u=mistral |  compute-2                  : ok=74   changed=28   unreachable=0    failed=0    skipped=256  rescued=0    ignored=0   
2019-06-23 20:18:57,180 p=543 u=mistral |  controller-0               : ok=167  changed=102  unreachable=0    failed=0    skipped=171  rescued=0    ignored=1   
2019-06-23 20:18:57,180 p=543 u=mistral |  controller-1               : ok=166  changed=101  unreachable=0    failed=1    skipped=171  rescued=0    ignored=1   
2019-06-23 20:18:57,180 p=543 u=mistral |  controller-2               : ok=167  changed=102  unreachable=0    failed=0    skipped=171  rescued=0    ignored=1   
2019-06-23 20:18:57,180 p=543 u=mistral |  undercloud                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
2019-06-23 20:18:57,181 p=543 u=mistral |  Sunday 23 June 2019  20:18:57 -0400 (0:03:10.539)       0:06:53.454 *********** 
2019-06-23 20:18:57,181 p=543 u=mistral |  ===============================================================================

Comment 17 Carlos Goncalves 2019-06-24 06:12:53 UTC
Bad copy-paste in my comment #15. The line

tripleo-config-generated-env-files/undercloud_parameters.yaml:  - clock.redhat.co

should have been

tripleo-config-generated-env-files/undercloud_parameters.yaml:  - clock.redhat.com

Comment 18 Carlos Goncalves 2019-06-24 06:15:49 UTC
[root@controller-1 heat-admin]# cat /etc/chrony.conf 
# Do not manually edit this file.
# Managed by ansible-role-chrony
server clock.redhat.com iburst minpoll 6 maxpoll 10
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
deny all
driftfile /var/lib/chrony/drift
logdir /var/log/chrony
rtcsync
makestep 1.0 3

Comment 19 Alex Schultz 2019-06-24 13:54:05 UTC
Please ensure connectivity to the ntp server. Also it's a best practice to use multiples because using just one can lead to issues if it's down or unavailable. The reported failure is what happens when chrony cannot sync to the configured time source.

Comment 20 Edu Alcaniz 2019-10-03 18:50:55 UTC
We are getting same issue

2019-10-03 12:44:34,065 p=6823 u=mistral |  TASK [Ensure system is NTP time synced] ****************************************
2019-10-03 12:44:34,065 p=6823 u=mistral |  Thursday 03 October 2019  12:44:34 +0200 (0:00:00.987)       0:03:50.988 ****** 
2019-10-03 12:44:44,468 p=6823 u=mistral |  changed: [overcloud-controller-0] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.014213", "end": "2019-10-03 06:44:44.440795", "rc": 0, "start": "2019-10-03 06:44:34.426582", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 939C071A, correction: 0.000000000, skew: 2.638", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 939C071A, correction: 0.000000000, skew: 2.638"]}
2019-10-03 12:44:44,571 p=6823 u=mistral |  changed: [overcloud-controller-2] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.013857", "end": "2019-10-03 06:44:44.547643", "rc": 0, "start": "2019-10-03 06:44:34.533786", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 939C0712, correction: 0.000024599, skew: 4.918", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 939C0712, correction: 0.000024599, skew: 4.918"]}
2019-10-03 12:47:44,760 p=6823 u=mistral |  fatal: [overcloud-controller-1]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.197515", "end": "2019-10-03 06:47:44.737817", "msg": "non-zero return code", "rc": 1, "start": "2019-10-03 06:44:34.540302", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]}



our templates 

#!/bin/bash
openstack overcloud deploy \
--templates \
--validation-errors-nonfatal \
-r ~/templates/roles_data.yaml \
-n /home/stack/templates/network/network_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e ~/templates/network/network-environment.yaml \
-e ~/templates/node-info.yaml \
-e ~/templates/storage/storage-environment.yaml \
-e ~/templates/network/service-netmap.yaml \
-e ~/templates/containers-prepare-parameter.yaml \
--ntp-server 0.rhel.pool.ntp.org \
--timeout 180


We can reach 0.rhel.pool.ntp.org

Comment 21 Edu Alcaniz 2019-10-03 18:52:01 UTC
Only happens in one node randomly every deployment that we have. 

Any idea?

Comment 22 Carlos Goncalves 2019-10-03 19:18:33 UTC
For me, it was because of a combo of unreliable Red Hat DNS servers and Red Hat NTP servers. I resolved by setting 2-3 NTP servers on all servers.

Comment 24 Alex Schultz 2019-10-03 20:39:07 UTC
Please use multiple ntp servers. It's the same problem as previously mentioned.  Chrony does not try to re-resolve the host so when you use pool.ntp.org systems you may end up with a bad host. Since chrony doesn't perform another lookup between sync, it's best to use multiple servers. Additionally if you're just using a *.pool.ntp.org, we already use multiple servers by default so you shouldn't need to specify.


Note You need to log in before you can comment on or make changes to this bug.