Bug 1301360 - [RFE][UX] validate that the nodes are pingable [NEEDINFO]
[RFE][UX] validate that the nodes are pingable
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
high Severity high
: Upstream M3
: 14.0 (Rocky)
Assigned To: Maros Zatko
: FutureFeature, Triaged
Depends On:
Blocks: 1442136
  Show dependency treegraph
Reported: 2016-01-24 08:15 EST by Udi
Modified: 2017-10-15 22:13 EDT (History)
12 users (show)

See Also:
Fixed In Version: openstack-heat-templates-0-0.5.1e6015dgit.el7ost
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jrist: needinfo? (mzatko)

Attachments (Terms of Use)

  None (edit)
Description Udi 2016-01-24 08:15:43 EST
Description of problem:
When deploying the overcloud, and for some reason the nodes are not pingable - the deployment hangs until it times out after 4 hours. Nodes may not be pingable if their nic-configs are wrong, or the nic order changed, or asymetric routing was not enabled, or for a million other reasons...

As soon as the deployment is at a state where the nodes *should* be pingable, and before the deployment proceeds any further and tries to connect to them or receive any call-backs from them, the director should test that the nodes can be pinged. If the ping fails the deployment should stop immediately, and print a descriptive error message so the user will know exactly what to troubleshoot.

Version-Release number of selected component (if applicable):
7.x and 8.0 beta

How reproducible:
Comment 3 Mike Burns 2016-04-07 17:03:37 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 5 Udi 2016-09-05 08:16:09 EDT
This seems to be implemented already. The code is here:
It is called from the templates here:

Ok to close the bug or is there anything else that needs to be implemented?
Comment 9 Udi 2016-11-08 03:51:21 EST
To test this fix, I ssh'ed to several of the nodes after the deployment finished, and ran the command 'sudo journalctl -u os-collect-config'. You can find lines like this:

Trying to ping for local network
Ping to succeeded.
Trying to ping default gateway to succeeded.
Trying to ping default gateway to succeeded.
Trying to ping default gateway to succeeded.

However, I couldn't find evidence that the nodes are pinging each other, or that the undercloud is pinging the nodes (how do I even check what the undercloud pinged?). It seems like the only pings are from the node to itself, and from the node to the undercloud. This is not the validation we wanted.
Comment 11 Jason E. Rist 2016-11-29 15:43:59 EST
Tomas, how does Udi test this more thoroughly to make sure it's not FailedQA?
Comment 12 Tomas Sedovic 2016-12-01 08:44:34 EST
It's not about more thorough testing. Rather, it seems that the checks that are in the Heat templates don't actually implement the RFE even though we initially thought they did.

Can we expect the nodes in general being able to ping each other? I think we should only check that the nodes can reach the controller and vice versa. I'm not aware of any need for two compute nodes talking to each other directly and with isolated networks, nodes from different roles wouldn't be able reach one another by design.

So what should this check entail? Controller pinging every node? Anything else?
Comment 13 Udi 2016-12-01 08:50:32 EST
The most important check are:
1) That the undercloud can ping all nodes
2) That the nodes can ping the controller and vice versa
Comment 16 Anandeep Pannu 2016-12-07 11:30:28 EST
We should implement as Udi has noted in Comment #13. 
1) That the undercloud can ping all nodes
2) That the nodes can ping the controller and vice versa
Comment 20 Jason E. Rist 2017-05-03 10:26:42 EDT
Maros - is this still being worked on - do you need anything from DFG:UI?

Note You need to log in before you can comment on or make changes to this bug.