Description of problem: Overcloud deployment ansible playbook failed on TASK [Ensure system is NTP time synced] TASK [Ensure system is NTP time synced] **************************************** Tuesday 09 April 2019 14:52:24 +0000 (0:00:00.620) 0:03:28.824 ********* skipping: [overcloud-compute2-0] => {"changed": false, "skip_reason": "Conditional result was False"} skipping: [overcloud-compute1-0] => {"changed": false, "skip_reason": "Conditional result was False"} skipping: [overcloud-compute0-0] => {"changed": false, "skip_reason": "Conditional result was False"} fatal: [overcloud-controller0-2]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.061951", "end": "2019-04-09 10:55:34.273360", "msg": "non-zero return code", "rc": 1, "start": "2019-04-09 10:52:24.211409", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]} fatal: [overcloud-controller0-0]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.195753", "end": "2019-04-09 10:55:33.828381", "msg": "non-zero return code", "rc": 1, "start": "2019-04-09 10:52:23.632628", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]} fatal: [overcloud-controller0-1]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.191272", "end": "2019-04-09 10:55:34.021873", "msg": "non-zero return code", "rc": 1, "start": "2019-04-09 10:52:23.830601", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]} NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* overcloud-compute0-0 : ok=60 changed=20 unreachable=0 failed=0 overcloud-compute1-0 : ok=60 changed=20 unreachable=0 failed=0 overcloud-compute2-0 : ok=60 changed=20 unreachable=0 failed=0 overcloud-controller0-0 : ok=137 changed=83 unreachable=0 failed=1 overcloud-controller0-1 : ok=137 changed=83 unreachable=0 failed=1 overcloud-controller0-2 : ok=137 changed=83 unreachable=0 failed=1 undercloud : ok=3 changed=0 unreachable=0 failed=0 Tuesday 09 April 2019 14:55:34 +0000 (0:03:10.482) 0:06:39.307 ********* =============================================================================== Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action verbosity=self.app_args.verbose_level) File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 327, in config_download raise exceptions.DeploymentError("Overcloud configuration failed.") tripleoclient.exceptions.DeploymentError: Overcloud configuration failed. Overcloud configuration failed. Version-Release number of selected component (if applicable): openstack-tripleo-common-containers-10.6.1-0.20190404000356.3398bec.el8ost.noarch python3-tripleoclient-heat-installer-11.3.1-0.20190403170353.73cc438.el8ost.noarch openstack-tripleo-image-elements-10.3.1-0.20190325204940.253fe88.el8ost.noarch ansible-tripleo-ipsec-9.0.1-0.20190220162047.f60ad6c.el8ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20190402220346.012209a.el8ost.noarch python3-tripleoclient-11.3.1-0.20190403170353.73cc438.el8ost.noarch python3-tripleo-common-10.6.1-0.20190404000356.3398bec.el8ost.noarch openstack-tripleo-validations-10.3.1-0.20190403171315.a4c40f2.el8ost.noarch openstack-tripleo-common-10.6.1-0.20190404000356.3398bec.el8ost.noarch openstack-tripleo-puppet-elements-10.2.1-0.20190327211339.0f6cacb.el8ost.noarch openstack-tripleo-heat-templates-10.4.1-0.20190403221322.0d98720.el8ost.noarch puppet-tripleo-10.3.1-0.20190403180925.81d7714.el8ost.noarch How reproducible: Steps to Reproduce: 1. Start Overcloud deployment (undercloud) [stack@site-undercloud-0 ~]$ cat overcloud_deploy.sh #!/bin/bash source /home/stack/stackrc export THT=/usr/share/openstack-tripleo-heat-templates openstack overcloud deploy --templates $THT/ \ --timeout 100 \ -e $THT/environments/podman.yaml \ -e $THT/environments/disable-telemetry.yaml \ -e $THT/environments/docker-ha.yaml \ -e $THT/environments/services/neutron-ovn-ha.yaml \ -e $THT/environments/network-isolation.yaml \ -e containers-prepare-parameters.yaml \ -e params.yaml \ -n /home/stack/virt/network/network_data_spine_leaf.yaml \ -r /home/stack/virt/roles/roles_data_spine_leaf.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/network/network-environment-overrides.yaml \ -e /home/stack/virt/nodes_data.yaml \ --log-file overcloud_deployment_90.log where (undercloud) [stack@site-undercloud-0 ~]$ cat params.yaml resource_registry: OS::TripleO::Services::Docker: OS::Heat::None parameter_defaults: DockerInsecureRegistryAddress: - had-04.ha.lab.eng.bos.redhat.com:5000 - brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 - 192.168.24.2:8787 SELinuxMode: permissive PythonInterpreter: /usr/bin/python3 NovaComputeLibvirtType: qemu DnsServers: ['10.11.5.19', '10.5.30.160'] ControllerCount: 3 ComputeCount: 3 NtpServer: ["clock.redhat.com","clock2.redhat.com"] 2. 3. Actual results: Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action verbosity=self.app_args.verbose_level) File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 327, in config_download raise exceptions.DeploymentError("Overcloud configuration failed.") tripleoclient.exceptions.DeploymentError: Overcloud configuration failed. Overcloud configuration failed. Expected results: No failures Additional info:
Created attachment 1553898 [details] ansible.log
This looks like the systems cannot reach the configured NTP servers. please ensure connectivity to clock.redhat.com and clock2.redhat.com from the controllers. Also please provide the /etc/chrony.conf configuration file. If you have a reproducer that would also be beneficial.
The issue reproduced several time today with different argument for deployment command such as (--ntp-server 10.5.27.10 and --ntp-server clock.redhat.com) The undercloud status: (undercloud) [stack@site-undercloud-0 ~]$ ping clock.redhat.com PING clock.corp.redhat.com (10.11.160.238) 56(84) bytes of data. 64 bytes from clock1.rdu2.redhat.com (10.11.160.238): icmp_seq=1 ttl=56 time=156 ms (undercloud) [stack@site-undercloud-0 ~]$ ping clock2.redhat.com PING clock.corp.redhat.com (10.16.255.1) 56(84) bytes of data. 64 bytes from clock.bos.redhat.com (10.16.255.1): icmp_seq=1 ttl=54 time=176 ms (undercloud) [stack@site-undercloud-0 ~]$ cat /etc/chrony.conf # Do not manually edit this file. # Managed by ansible-role-chrony server clock.redhat.com iburst minpoll 6 maxpoll 10 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 deny all driftfile /var/lib/chrony/drift logdir /var/log/chrony rtcsync makestep 1.0 3 The controller status: [root@overcloud-controller0-0 ~]# ping clock.redhat.com ping: clock.redhat.com: Name or service not known [root@overcloud-controller0-0 ~]# ping clock2.redhat.com ping: clock2.redhat.com: Name or service not known [root@overcloud-controller0-0 ~]# ping 10.5.27.10 PING 10.5.27.10 (10.5.27.10) 56(84) bytes of data. 64 bytes from 10.5.27.10: icmp_seq=1 ttl=54 time=184 ms [root@overcloud-controller0-0 ~]# cat /etc/chrony.conf # Do not manually edit this file. # Managed by ansible-role-chrony server clock.redhat.com iburst minpoll 6 maxpoll 10 server clock2.redhat.com iburst minpoll 6 maxpoll 10 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 deny all driftfile /var/lib/chrony/drift logdir /var/log/chrony rtcsync makestep 1.0 3 If you need running environment ping me on rhos-qe,rhos-dev,edge - yobshans Thank you
Since the overcloud cannot resolve clock.redhat.com it's failing. Check your dns server configuration. Did you specify a dns server for the ctlplane-subnet on the undercloud?
Issue was that the nameservers were not properly configured for the overcloud nodes. In order to correct this, the undercloud should have been installed with the undercloud_nameservers specified in the undercloud.conf. If the environment is already installed, you can run 'openstack subnet set ctlplane-subnet --dns-nameserver <nameserver ip>' and manually fix the resolv.conf on the hosts and rerun the deployment process.
To add some comments here, we discovered that none of the ifcfg files for the network interfaces (bridge or ethernet) had any DNS servers. These should be populated automatically if dns_nameserver is set on the subnet, but our documentation only mentions adding the nameservers to one subnet, the leaf0 ctlplane subnet. Yuri is going to add nameservers to the other leaf subnets, but if that doesn't work then we probably have a bug in the auto-population of that parameter. Here is the code from puppet/role.role.j2.yaml in openstack-tripleo-heat-templates: conditions: dnsservers_set: not: equals: [{get_param: DnsServers}, []] [...] resources: NetworkConfig: type: OS::TripleO::{{role.name}}::Net::SoftwareConfig properties: DnsServers: if: - dnsservers_set - {get_param: DnsServers} - {get_attr: [{{server_resource_name}}, addresses, ctlplane, 0, subnets, 0, dns_nameservers]}
The dns servers should be automatically populated on undercloud install with the value from undercloud_nameservers. I think this also gets populated for all leafs in the code as well, https://github.com/openstack/tripleo-heat-templates/blob/master/extraconfig/post_deploy/undercloud_ctlplane_network.py That being said if leafs are being added outside of the undercloud install process, then manual nameserver configuration on the neutron subnets may be required.
Manually configuration doesn't work (undercloud) [stack@site-undercloud-0 ~]$ openstack subnet show leaf1 |grep dns | dns_nameservers | 10.0.10.1 | (undercloud) [stack@site-undercloud-0 ~]$ openstack subnet show leaf0 |grep dns | dns_nameservers | 10.0.10.1 | (undercloud) [stack@site-undercloud-0 ~]$ openstack subnet show leaf2 |grep dns | dns_nameservers | 10.0.10.1 | Deployment still failed
I'll retest deployment process with undercloud_nameservers specified in the undercloud.conf and update the bug
Is 10.0.10.1 a valid nameserver? Also you need to manually fix any systems already provided. You won't get the dns post-provisioning from the subnets
(In reply to Alex Schultz from comment #10) > Is 10.0.10.1 a valid nameserver? Yes, [root@overcloud-controller0-0 network-scripts]# ping 10.0.10.1 PING 10.0.10.1 (10.0.10.1) 56(84) bytes of data. 64 bytes from 10.0.10.1: icmp_seq=1 ttl=64 time=0.389 ms 64 bytes from 10.0.10.1: icmp_seq=2 ttl=64 time=0.165 ms This is virt environmnet [root@overcloud-controller0-0 network-scripts]# ip route default via 10.0.10.1 dev br-ex 10.0.10.0/24 dev br-ex proto kernel scope link src 10.0.10.110 169.254.0.0/16 dev eth0 scope link metric 1002 169.254.0.0/16 dev eth1 scope link metric 1003 169.254.0.0/16 dev eth2 scope link metric 1004 169.254.0.0/16 dev br-isolated scope link metric 1006 169.254.0.0/16 dev br-ex scope link metric 1007 169.254.0.0/16 dev vlan1188 scope link metric 1008 169.254.0.0/16 dev vlan1185 scope link metric 1009 169.254.0.0/16 dev vlan1189 scope link metric 1010 169.254.0.0/16 dev vlan1183 scope link metric 1011 169.254.169.254 via 192.168.24.3 dev eth0 172.18.1.0/24 dev vlan1188 proto kernel scope link src 172.18.1.212 172.18.2.0/24 via 172.18.1.254 dev vlan1188 172.18.3.0/24 via 172.18.1.254 dev vlan1188 172.19.1.0/24 dev vlan1189 proto kernel scope link src 172.19.1.174 172.19.2.0/24 via 172.19.1.254 dev vlan1189 172.19.3.0/24 via 172.19.1.254 dev vlan1189 172.23.1.0/24 dev vlan1183 proto kernel scope link src 172.23.1.237 172.23.2.0/24 via 172.23.1.254 dev vlan1183 172.23.3.0/24 via 172.23.1.254 dev vlan1183 172.25.1.0/24 dev vlan1185 proto kernel scope link src 172.25.1.117 172.25.2.0/24 via 172.25.1.254 dev vlan1185 172.25.3.0/24 via 172.25.1.254 dev vlan1185 192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.12 192.168.34.0/24 via 192.168.24.254 dev eth0 192.168.44.0/24 via 192.168.24.254 dev eth0 > Also you need to manually fix any systems > already provided. You won't get the dns post-provisioning from the subnets I ran deployment with provisioning nodes also with the same result.
Just because it's pingable doesn't mean it's a valid nameserver. You'd have to query dns with it to verify this. The issue isn't the ntp sync but rather the supplied environment network configurations.
The reporter was able to get passed this issue with configuration changes.
I'm seeing the same on multiple deployment tries, on different machines, and on either the undercloud or later at overcloud deployment and varying between 1 to 3 controllers failing at time sync. NTP server clock.redhat.com is set in both undercloud and overcloud. [stack@undercloud-0 ~]$ grep clock.redhat.com overcloud_deploy.sh --ntp-server clock.redhat.com \ [stack@undercloud-0 ~]$ grep clock.redhat.com virt/config_heat.yaml -B 1 parameter_defaults: NtpServer: clock.redhat.com [stack@undercloud-0 ~]$ grep -r clock.redhat.com tripleo-config-generated-env-files/ -B 1 tripleo-config-generated-env-files/undercloud_parameters.yaml- NtpServer: tripleo-config-generated-env-files/undercloud_parameters.yaml: - clock.redhat.co 2019-06-23 20:15:46,641 p=543 u=mistral | TASK [Ensure system is NTP time synced] **************************************** 2019-06-23 20:15:46,641 p=543 u=mistral | Sunday 23 June 2019 20:15:46 -0400 (0:00:00.832) 0:03:42.915 *********** 2019-06-23 20:15:46,910 p=543 u=mistral | skipping: [compute-0] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-06-23 20:15:46,956 p=543 u=mistral | skipping: [compute-1] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-06-23 20:15:46,969 p=543 u=mistral | skipping: [compute-2] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-06-23 20:15:56,919 p=543 u=mistral | changed: [controller-0] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.015852", "end": "2019-06-23 20:15:56.897128", "rc": 0, "start": "2019-06-23 20:15:46.881276", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 0A0BA0EE, correction: 0.000004039, skew: 214.940", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 0A0BA0EE, correction: 0.000004039, skew: 214.940"]} 2019-06-23 20:15:57,080 p=543 u=mistral | changed: [controller-2] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.014671", "end": "2019-06-23 20:15:57.056399", "rc": 0, "start": "2019-06-23 20:15:47.041728", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 0A051B0A, correction: 0.000000882, skew: 18.576", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 0A051B0A, correction: 0.000000882, skew: 18.576"]} 2019-06-23 20:18:57,176 p=543 u=mistral | fatal: [controller-1]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.198191", "end": "2019-06-23 20:18:57.153933", "msg": "non-zero return code", "rc": 1, "start": "2019-06-23 20:15:46.955742", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]} 2019-06-23 20:18:57,177 p=543 u=mistral | NO MORE HOSTS LEFT ************************************************************* 2019-06-23 20:18:57,179 p=543 u=mistral | PLAY RECAP ********************************************************************* 2019-06-23 20:18:57,179 p=543 u=mistral | compute-0 : ok=74 changed=28 unreachable=0 failed=0 skipped=256 rescued=0 ignored=0 2019-06-23 20:18:57,180 p=543 u=mistral | compute-1 : ok=74 changed=28 unreachable=0 failed=0 skipped=256 rescued=0 ignored=0 2019-06-23 20:18:57,180 p=543 u=mistral | compute-2 : ok=74 changed=28 unreachable=0 failed=0 skipped=256 rescued=0 ignored=0 2019-06-23 20:18:57,180 p=543 u=mistral | controller-0 : ok=167 changed=102 unreachable=0 failed=0 skipped=171 rescued=0 ignored=1 2019-06-23 20:18:57,180 p=543 u=mistral | controller-1 : ok=166 changed=101 unreachable=0 failed=1 skipped=171 rescued=0 ignored=1 2019-06-23 20:18:57,180 p=543 u=mistral | controller-2 : ok=167 changed=102 unreachable=0 failed=0 skipped=171 rescued=0 ignored=1 2019-06-23 20:18:57,180 p=543 u=mistral | undercloud : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 2019-06-23 20:18:57,181 p=543 u=mistral | Sunday 23 June 2019 20:18:57 -0400 (0:03:10.539) 0:06:53.454 *********** 2019-06-23 20:18:57,181 p=543 u=mistral | ===============================================================================
Bad copy-paste in my comment #15. The line tripleo-config-generated-env-files/undercloud_parameters.yaml: - clock.redhat.co should have been tripleo-config-generated-env-files/undercloud_parameters.yaml: - clock.redhat.com
[root@controller-1 heat-admin]# cat /etc/chrony.conf # Do not manually edit this file. # Managed by ansible-role-chrony server clock.redhat.com iburst minpoll 6 maxpoll 10 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 deny all driftfile /var/lib/chrony/drift logdir /var/log/chrony rtcsync makestep 1.0 3
Please ensure connectivity to the ntp server. Also it's a best practice to use multiples because using just one can lead to issues if it's down or unavailable. The reported failure is what happens when chrony cannot sync to the configured time source.
We are getting same issue 2019-10-03 12:44:34,065 p=6823 u=mistral | TASK [Ensure system is NTP time synced] **************************************** 2019-10-03 12:44:34,065 p=6823 u=mistral | Thursday 03 October 2019 12:44:34 +0200 (0:00:00.987) 0:03:50.988 ****** 2019-10-03 12:44:44,468 p=6823 u=mistral | changed: [overcloud-controller-0] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.014213", "end": "2019-10-03 06:44:44.440795", "rc": 0, "start": "2019-10-03 06:44:34.426582", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 939C071A, correction: 0.000000000, skew: 2.638", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 939C071A, correction: 0.000000000, skew: 2.638"]} 2019-10-03 12:44:44,571 p=6823 u=mistral | changed: [overcloud-controller-2] => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:00:10.013857", "end": "2019-10-03 06:44:44.547643", "rc": 0, "start": "2019-10-03 06:44:34.533786", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 939C0712, correction: 0.000024599, skew: 4.918", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 939C0712, correction: 0.000024599, skew: 4.918"]} 2019-10-03 12:47:44,760 p=6823 u=mistral | fatal: [overcloud-controller-1]: FAILED! => {"changed": true, "cmd": ["chronyc", "waitsync", "20"], "delta": "0:03:10.197515", "end": "2019-10-03 06:47:44.737817", "msg": "non-zero return code", "rc": 1, "start": "2019-10-03 06:44:34.540302", "stderr": "", "stderr_lines": [], "stdout": "try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 2, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 3, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 4, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 5, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 6, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 7, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 8, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 9, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 10, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 11, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 12, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 13, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 14, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 15, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 16, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 17, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 18, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 19, refid: 00000000, correction: 0.000000000, skew: 0.000\ntry: 20, refid: 00000000, correction: 0.000000000, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000000, skew: 0.000"]} our templates #!/bin/bash openstack overcloud deploy \ --templates \ --validation-errors-nonfatal \ -r ~/templates/roles_data.yaml \ -n /home/stack/templates/network/network_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/templates/network/network-environment.yaml \ -e ~/templates/node-info.yaml \ -e ~/templates/storage/storage-environment.yaml \ -e ~/templates/network/service-netmap.yaml \ -e ~/templates/containers-prepare-parameter.yaml \ --ntp-server 0.rhel.pool.ntp.org \ --timeout 180 We can reach 0.rhel.pool.ntp.org
Only happens in one node randomly every deployment that we have. Any idea?
For me, it was because of a combo of unreliable Red Hat DNS servers and Red Hat NTP servers. I resolved by setting 2-3 NTP servers on all servers.
Please use multiple ntp servers. It's the same problem as previously mentioned. Chrony does not try to re-resolve the host so when you use pool.ntp.org systems you may end up with a bad host. Since chrony doesn't perform another lookup between sync, it's best to use multiple servers. Additionally if you're just using a *.pool.ntp.org, we already use multiple servers by default so you shouldn't need to specify.