Description of problem: ansible_facts.default_ipv4.gateway appears to not be set when scaling up a multi-cells v2 TLS-e environment resulting the in the ping command to fail when deploying: 2022-05-17 13:34:43.567462 | 525400f9-7a8a-d346-8ffd-000000000b9e | TIMING | tripleo_nodes_validation : Check Default IPv4 Gateway availability | cell1-compute-1 | 0:01:35.711596 | 0.35s 2022-05-17 13:34:43.627570 | 525400f9-7a8a-d346-8ffd-000000000b9f | TASK | Check all networks Gateway availability 2022-05-17 13:34:43.787038 | 525400f9-7a8a-d346-8ffd-000000000b9f | FATAL | Check all networks Gateway availability | cell1-compute-0 | error={"ansible_loop_var": "gateway_ip", "changed": false, "cmd": ["ping", "-w", "10", "-c", "1"], "delta": "0:00:00.004407", "end": "2022-05-17 09:34:43.657259", "gateway_ip": "", "msg": "non-zero return code", "rc": 2, "start": "2022-05-17 09:34:43.652852", "stderr": "Usage: ping [-aAbBdDfhLnOqrRUvV64] [-c count] [-i interval] [-I interface]\n [-m mark] [-M pmtudisc_option] [-l preload] [-p pattern] [-Q tos]\n [-s packetsize] [-S sndbuf] [-t ttl] [-T timestamp_option]\n [-w deadline] [-W timeout] [hop1 ...] destination\nUsage: ping -6 [-aAbBdDfhLnOqrRUvV] [-c count] [-i interval] [-I interface]\n [-l preload] [-m mark] [-M pmtudisc_option]\n [-N nodeinfo_option] [-p pattern] [-Q tclass] [-s packetsize]\n [-S sndbuf] [-t ttl] [-T timestamp_option] [-w deadline]\n [-W timeout] destination", "stderr_lines": ["Usage: ping [-aAbBdDfhLnOqrRUvV64] [-c count] [-i interval] [-I interface]", " [-m mark] [-M pmtudisc_option] [-l preload] [-p pattern] [-Q tos]", " [-s packetsize] [-S sndbuf] [-t ttl] [-T timestamp_option]", " [-w deadline] [-W timeout] [hop1 ...] destination", "Usage: ping -6 [-aAbBdDfhLnOqrRUvV] [-c count] [-i interval] [-I interface]", " [-l preload] [-m mark] [-M pmtudisc_option]", " [-N nodeinfo_option] [-p pattern] [-Q tclass] [-s packetsize]", " [-S sndbuf] [-t ttl] [-T timestamp_option] [-w deadline]", " [-W timeout] destination"], "stdout": "", "stdout_lines": []} Version-Release number of selected component (if applicable): RHOS-16.2-RHEL-8-20220513.n.2 How reproducible: Only have tried it once in phase3 CI Steps to Reproduce: 1. Deploy multi-cells v2 environment with TLS-e with above puddle. 2. 3. Actual results: Deployment fails due to ping cmd failing due to not having a target ip address Expected results: Connectivity check is successful Additional info: Build: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-compute-nova-16.2_director-rhel-virthost-1cont_1comp_1cellcont_2cellcomp_1ipa-ipv4-geneve-multi-cell-tls-everywhere-phase3/64 Failure: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-compute-nova-16.2_director-rhel-virthost-1cont_1comp_1cellcont_2cellcomp_1ipa-ipv4-geneve-multi-cell-tls-everywhere-phase3/64/undercloud-0/home/stack/overcloud_cell_deployment.log.gz
For the overcloud deployment: /var/lib/mistral/overcloud/global_vars.yaml contains: ping_test_gateway_ips: BlockStorage: [] CephStorage: [] Compute: [] Controller: - 10.0.0.1 ObjectStorage: [] For the Cell deployment: /var/lib/mistral/cell1/global_vars.yaml contains: ping_test_gateway_ips: CellController: - '' - '' - '' - '' - 10.0.0.1 Compute: - '' - '' - '' Instead of empty lists, we end up with empty string values. The empty string is passed as argument to ping ... which cause the ping command to raise an error because there is a missing argument i.e no address to ping is given.
I proposed a fix: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/842274 A workaround is to set the ansible var `tripleo_nodes_validation_validate_gateway_icmp` to `false` using the `ExtraAnsibleHostVars` THT paramter. This will disable the "Check all networks Gateway availability" task[1]. [1] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_nodes_validation/tasks/main.yml#L47
(In reply to Harald Jensås from comment #2) > I proposed a fix: > https://review.opendev.org/c/openstack/tripleo-heat-templates/+/842274 > > A workaround is to set the ansible var > `tripleo_nodes_validation_validate_gateway_icmp` to `false` using the > `ExtraAnsibleHostVars` THT paramter. > This will disable the "Check all networks Gateway availability" task[1]. > > > [1] > https://opendev.org/openstack/tripleo-ansible/src/branch/master/ > tripleo_ansible/roles/tripleo_nodes_validation/tasks/main.yml#L47 can't we just set ValidateGatewaysIcmp: false [1]? [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/train/common/deploy-steps.j2#L127
(In reply to Martin Schuppert from comment #11) > (In reply to Harald Jensås from comment #2) > > I proposed a fix: > > https://review.opendev.org/c/openstack/tripleo-heat-templates/+/842274 > > > > A workaround is to set the ansible var > > `tripleo_nodes_validation_validate_gateway_icmp` to `false` using the > > `ExtraAnsibleHostVars` THT paramter. > > This will disable the "Check all networks Gateway availability" task[1]. > > > > > > [1] > > https://opendev.org/openstack/tripleo-ansible/src/branch/master/ > > tripleo_ansible/roles/tripleo_nodes_validation/tasks/main.yml#L47 > > can't we just set ValidateGatewaysIcmp: false [1]? > > [1] > https://github.com/openstack/tripleo-heat-templates/blob/stable/train/common/ > deploy-steps.j2#L127 Oh, yes! Indeed, that would be the easier way. Thanks Martin! @jparker , as Martin points out the better workaround is to set THT parameter `ValidateGatewaysIcmp: false`.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8794