Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1622660

Summary: Ping sometimes fails to controller nodes.
Product: Red Hat OpenStack Reporter: Darin Sorrentino <dsorrent>
Component: openstack-tripleo-heat-templatesAssignee: Bob Fournier <bfournie>
Status: CLOSED ERRATA QA Contact: mlammon
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aschultz, bfournie, bjacot, harsh.kotak, lmarsh, mburns
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)Flags: lmarsh: needinfo-
lmarsh: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.7-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1635423 (view as bug list) Environment:
Last Closed: 2018-11-13 22:28:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1635423    

Description Darin Sorrentino 2018-08-27 17:10:52 UTC
Description of problem:

While deploying an Overcloud that is utilizing IPv6 addressing, we sometimes see where the pinging to the controllers fails as part of the deployment validation which then results in a failed deploy.  Through troubleshooting, we've determined that pings to the gateways do not fail and when pinging the gateway, it appears to "fix" the ability to ping to the Controller Nodes.

Within our team we've tried to figure out why this would be the case and one of our team members pointed out that IPV6 does not have ARP, it has neighbor solicitation and router solicitation, and sometimes neighbor solicitation isn't on or doesn't work, in which case you -have- to establish connectivity to the router before any other host communication works.  He said this only applies to non eui64 (mac address embedded in ipv6 address) setups.

We're not sure if that's what is happening here, however, we do know if we continually run an Ansible Playbook that connects tot he overcloud nodes and has them ping their default gateways during deployment, the deployment succeeds 100% of the time.

In looking at the all_nodes.sh validation script:

/usr/share/openstack-tripleo-heat-templates/validation-scripts/all-nodes.sh

We noticed that it is performing 2 tests, ping_controller_ips and ping_default_gateways, in that order.  We believe flipping these around will correct this issue.

This BZ is to have those 2 tests swapped so the ping of the gateways happens before pinging the controllers.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 bjacot 2018-10-31 16:40:49 UTC
Verified on OSP13 recommended change is included.

[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-10-24.1
[stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-8.0.7-4.el7ost.noarch

Comment 10 errata-xmlrpc 2018-11-13 22:28:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3587