Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1550167 - validation-scripts/all-nodes.sh wait time verification
validation-scripts/all-nodes.sh wait time verification
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
11.0 (Ocata)
Unspecified Unspecified
medium Severity medium
: z5
: 11.0 (Ocata)
Assigned To: Emilien Macchi
Gurenko Alex
: Triaged, ZStream
Depends On:
Blocks: 1545666
  Show dependency treegraph
 
Reported: 2018-02-28 12:02 EST by Alex Schultz
Modified: 2018-05-18 13:04 EDT (History)
7 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-6.2.7-7.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1545666
Environment:
Last Closed: 2018-05-18 13:03:18 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 548663 None None None 2018-02-28 12:02 EST
Red Hat Product Errata RHSA-2018:1627 None None None 2018-05-18 13:04 EDT

  None (edit)
Description Alex Schultz 2018-02-28 12:02:03 EST
osp11

+++ This bug was initially created as a clone of Bug #1545666 +++

Description of problem:
I would like to report a possible fault in validation script:

/usr/share/openstack-tripleo-heat-templates/validation-scripts/all-nodes.sh

There is a function 

function ping_retry() {
  local IP_ADDR=$1
  local TIMES=${2:-'10'}
  local COUNT=0
  local PING_CMD=ping
  if [[ $IP_ADDR =~ ":" ]]; then
    PING_CMD=ping6
  fi
  until [ $COUNT -ge $TIMES ]; do
    if $PING_CMD -w 300 -c 1 $IP_ADDR &> /dev/null; then
      echo "Ping to $IP_ADDR succeeded."
      return 0
    fi
    echo "Ping to $IP_ADDR failed. Retrying..."
    COUNT=$(($COUNT + 1))
  done
  return 1
}

Problematic line here is

$PING_CMD -w 300 -c 1 $IP_ADDR

According to man ping:
       -w deadline
              Specify  a  timeout,  in seconds, before ping exits regardless of how many packets have been sent or received. In this case ping does not stop after count packet are sent, it waits either for deadline expire or until count probes are answered or for some error notification from network.

So, "-w 300" means 300 seconds deadline wait. and repeat this up to 10 times as per TIMES variable
This would provide a timeout of 3000 seconds or 50 minutes for the ping to complete on the worst case scenario.


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.3.3-1.el7ost.noarch.rpm



Actual results:
if "some error notification from network" is received, the ping will not wait "-w 300", it will exit quite fast.

Here is a test, where I ping a non-existing IP address in a existing subnet:

[VNF11 VPOD3 stack@director validation-scripts]$ time ping -w 300 -c 1 10.33.110.150
PING 10.33.110.150 (10.33.110.150) 56(84) bytes of data.
From 10.33.110.133 icmp_seq=1 Destination Host Unreachable
From 10.33.110.133 icmp_seq=2 Destination Host Unreachable
From 10.33.110.133 icmp_seq=3 Destination Host Unreachable
From 10.33.110.133 icmp_seq=4 Destination Host Unreachable

--- 10.33.110.150 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 2999ms
pipe 4

real    0m3.008s
user    0m0.000s
sys     0m0.002s

The ping exited in 3 seconds. If this happens during deployment, it would provide a timeout of 3 * 10 = 30 seconds.

30 seconds is too short if this ping must be made over bonded interface with LACP.



Expected results:

It takes anywhere between 30 and 60 seconds for LACP to become functional.

It does not matter is the slow or fast LACP mode is used on the switches, 30 seconds is a borderline minimum, and is not enough.

--- Additional comment from Dariusz Wojewódzki on 2018-02-15 08:33:13 EST ---

I noticed this has been changed in OSP12, openstack-tripleo-heat-templates-7.0.3-22.el7ost.noarch.rpm:


    if $PING_CMD -w 10 -c 1 $IP_ADDR &> /dev/null; then


Is it possible to implement it to OSP11 and OSP10?
Comment 9 Gurenko Alex 2018-05-13 09:07:32 EDT
Verified on puddle 2018-05-03.2

[stack@undercloud-0 ~]$ sed -n 13,19p /usr/share/openstack-tripleo-heat-templates/validation-scripts/all-nodes.sh
    if $PING_CMD -w 10 -c 1 $IP_ADDR &> /dev/null; then
      echo "Ping to $IP_ADDR succeeded."
      return 0
    fi
    echo "Ping to $IP_ADDR failed. Retrying..."
    COUNT=$(($COUNT + 1))
    sleep 60

[stack@undercloud-0 ~]$ rpm -q openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-6.2.12-2.el7ost.noarch
Comment 12 errata-xmlrpc 2018-05-18 13:03:18 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1627

Note You need to log in before you can comment on or make changes to this bug.