Bug 1301360 - [RFE][UX] validate that the nodes are pingable [NEEDINFO]
[RFE][UX] validate that the nodes are pingable
Status: ASSIGNED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
high Severity high
: Upstream M3
: 13.0 (Queens)
Assigned To: Maros Zatko
Udi
: FutureFeature, Triaged
Depends On:
Blocks: 1442136
  Show dependency treegraph
 
Reported: 2016-01-24 08:15 EST by Udi
Modified: 2017-09-15 22:14 EDT (History)
12 users (show)

See Also:
Fixed In Version: openstack-heat-templates-0-0.5.1e6015dgit.el7ost
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jrist: needinfo? (mzatko)


Attachments (Terms of Use)

  None (edit)
Description Udi 2016-01-24 08:15:43 EST
Description of problem:
When deploying the overcloud, and for some reason the nodes are not pingable - the deployment hangs until it times out after 4 hours. Nodes may not be pingable if their nic-configs are wrong, or the nic order changed, or asymetric routing was not enabled, or for a million other reasons...

As soon as the deployment is at a state where the nodes *should* be pingable, and before the deployment proceeds any further and tries to connect to them or receive any call-backs from them, the director should test that the nodes can be pinged. If the ping fails the deployment should stop immediately, and print a descriptive error message so the user will know exactly what to troubleshoot.


Version-Release number of selected component (if applicable):
7.x and 8.0 beta


How reproducible:
100%
Comment 3 Mike Burns 2016-04-07 17:03:37 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 5 Udi 2016-09-05 08:16:09 EDT
This seems to be implemented already. The code is here:
https://github.com/openstack/tripleo-heat-templates/blob/master/validation-scripts/all-nodes.sh
It is called from the templates here:
https://github.com/openstack/tripleo-heat-templates/blob/b8f154be31c5847dc376a72cf9c0835aa0001afd/overcloud.yaml#L923-L961

Ok to close the bug or is there anything else that needs to be implemented?
Comment 9 Udi 2016-11-08 03:51:21 EST
To test this fix, I ssh'ed to several of the nodes after the deployment finished, and ran the command 'sudo journalctl -u os-collect-config'. You can find lines like this:

Trying to ping 172.16.0.26 for local network 172.16.0.0/24.
Ping to 172.16.0.26 succeeded.
Trying to ping default gateway 10.35.163.254...Ping to 10.35.163.254 succeeded.
Trying to ping default gateway 10.35.190.254...Ping to 10.35.190.254 succeeded.
Trying to ping default gateway 172.16.0.1...Ping to 172.16.0.1 succeeded.

However, I couldn't find evidence that the nodes are pinging each other, or that the undercloud is pinging the nodes (how do I even check what the undercloud pinged?). It seems like the only pings are from the node to itself, and from the node to the undercloud. This is not the validation we wanted.
Comment 11 Jason E. Rist 2016-11-29 15:43:59 EST
Tomas, how does Udi test this more thoroughly to make sure it's not FailedQA?
Comment 12 Tomas Sedovic 2016-12-01 08:44:34 EST
It's not about more thorough testing. Rather, it seems that the checks that are in the Heat templates don't actually implement the RFE even though we initially thought they did.

Can we expect the nodes in general being able to ping each other? I think we should only check that the nodes can reach the controller and vice versa. I'm not aware of any need for two compute nodes talking to each other directly and with isolated networks, nodes from different roles wouldn't be able reach one another by design.

So what should this check entail? Controller pinging every node? Anything else?
Comment 13 Udi 2016-12-01 08:50:32 EST
The most important check are:
1) That the undercloud can ping all nodes
2) That the nodes can ping the controller and vice versa
Comment 16 Anandeep Pannu 2016-12-07 11:30:28 EST
We should implement as Udi has noted in Comment #13. 
1) That the undercloud can ping all nodes
2) That the nodes can ping the controller and vice versa
Comment 20 Jason E. Rist 2017-05-03 10:26:42 EDT
Maros - is this still being worked on - do you need anything from DFG:UI?

Note You need to log in before you can comment on or make changes to this bug.