Bug 1301549 - [RFE][UX] validate that there are no duplicate IPs as a result of bad templates in nic-configs
Summary: [RFE][UX] validate that there are no duplicate IPs as a result of bad templat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Upstream M3
: 12.0 (Pike)
Assignee: Florian Fuchs
QA Contact: Ola Pavlenko
URL:
Whiteboard:
Depends On: 1509640
Blocks: 1442136 1467895 1469882 1513624
TreeView+ depends on / blocked
 
Reported: 2016-01-25 11:24 UTC by Udi Kalifon
Modified: 2018-02-05 19:02 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-validations-7.4.1-0.20171007010758.2e43f1a.el7ost
Doc Type: Enhancement
Doc Text:
The update adds a new validation to check the overcloud's network environment. This helps avoid any conflicts with IP addresses, VLANs, and allocation pool when deploying your overcloud.
Clone Of:
: 1513624 (view as bug list)
Environment:
Last Closed: 2017-12-13 20:37:32 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 341586 0 None None None 2017-05-31 10:58:34 UTC
OpenStack gerrit 504430 0 None None None 2017-10-04 10:14:26 UTC
OpenStack gerrit 509472 0 None None None 2017-10-04 14:15:30 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Udi Kalifon 2016-01-25 11:24:08 UTC
Description of problem:
I edited my network-environment.yaml and changed all storage-mgmt ports and networks to noop. I mapped storage-mgmt to storage as well and tried to deploy. The deployment hanged and timed out after 4 hours without giving me a clue as to the cause. It turned out that I still had references to the storage-mgmt vlan and IP in the nic-configs, and the ceph node got a duplicate IP for the ctlplane on 2 different nics.


Version-Release number of selected component (if applicable):
8.0 beta


How reproducible:
100%


Steps to Reproduce:
1. Set some ports and networks to noop, and then still define them on a nic which is not the one that's supposed to be for the ctlplane


Actual results:
Deployment hangs for 4 hours and fails without a clear error message


Expected results:
This situation should be detectable. Any config that will result in duplicate IPs should be stopped before the deployment starts, and the user should get an informative error message.

Comment 3 Mike Burns 2016-04-07 21:03:37 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 5 Ola Pavlenko 2016-12-12 14:37:01 UTC
Potential candidate for M-3

Comment 10 Jason E. Rist 2017-07-06 22:51:37 UTC
Requesting M3Approved, this just needs a +A from cores.  Low impact, high need.

Comment 11 Jason E. Rist 2017-07-06 23:11:19 UTC
Actually it was just merged.

Comment 13 Udi Kalifon 2017-09-13 07:15:28 UTC
This validator crashes when you run it on a plan other than the default 'overcloud' plan... I can't test any configuration changes :(. It complains about a utf8 error but this is happening also with the default network-environment.yaml.


TRIPLEO_PLAN_NAME=default ansible-playbook -vvv -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/network-environment.yaml
Using /etc/ansible/ansible.cfg as config file

PLAYBOOK: network-environment.yaml *************************************************************************************************************************************************************************
1 plays in /usr/share/openstack-tripleo-validations/validations/network-environment.yaml

PLAY [undercloud] ******************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack
<localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" && echo ansible-tmp-1505286244.94-72836329127152="` echo /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" ) && sleep 0'
<localhost> PUT /tmp/tmp0OqJZo TO /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py
<localhost> EXEC /bin/sh -c 'chmod u+x /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/ /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py; rm -rf "/home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/" > /dev/null 2>&1 && sleep 0'
ok: [localhost]
META: ran handlers

TASK [Validate the network environment files] **************************************************************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/validations/network-environment.yaml:19
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 125, in run
    res = self._execute()
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 522, in _execute
    result = self._handler.run(task_vars=variables)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 45, in run
    results = merge_hash(results, self._execute_module(tmp=tmp, task_vars=task_vars, wrap_async=wrap_async))
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 632, in _execute_module
    (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 157, in _configure_module
    task_vars=task_vars, module_compression=self._play_context.module_compression)
  File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 796, in modify_module
    (b_module_data, module_style, shebang) = _find_module_utils(module_name, b_module_data, module_path, module_args, task_vars, module_compression)
  File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 629, in _find_module_utils
    python_repred_params = repr(json.dumps(params))
  File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1: invalid continuation byte

fatal: [localhost]: FAILED! => {
    "failed": true, 
    "msg": "Unexpected failure during module execution.", 
    "stdout": ""
}
 [WARNING]: Could not create retry file '/usr/share/openstack-tripleo-validations/validations/network-environment.retry'.         [Errno 13] Permission denied: u'/usr/share/openstack-tripleo-
validations/validations/network-environment.retry'


PLAY RECAP *************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1

Comment 14 Florian Fuchs 2017-09-15 15:22:27 UTC
(In reply to Udi from comment #13)
> This validator crashes when you run it on a plan other than the default
> 'overcloud' plan... I can't test any configuration changes :(. It complains
> about a utf8 error but this is happening also with the default
> network-environment.yaml.
> 
> 
> TRIPLEO_PLAN_NAME=default ansible-playbook -vvv -i
> /usr/bin/tripleo-ansible-inventory
> /usr/share/openstack-tripleo-validations/validations/network-environment.yaml
> Using /etc/ansible/ansible.cfg as config file
> 
> PLAYBOOK: network-environment.yaml
> *****************************************************************************
> *****************************************************************************
> ***************
> 1 plays in
> /usr/share/openstack-tripleo-validations/validations/network-environment.yaml
> 
> PLAY [undercloud]
> *****************************************************************************
> *****************************************************************************
> ********************************
> 
> TASK [Gathering Facts]
> *****************************************************************************
> *****************************************************************************
> ***************************
> Using module file
> /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py
> <localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack
> <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
> <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" && echo
> ansible-tmp-1505286244.94-72836329127152="` echo
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" ) &&
> sleep 0'
> <localhost> PUT /tmp/tmp0OqJZo TO
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py
> <localhost> EXEC /bin/sh -c 'chmod u+x
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py
> && sleep 0'
> <localhost> EXEC /bin/sh -c '/usr/bin/python
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py;
> rm -rf "/home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/"
> > /dev/null 2>&1 && sleep 0'
> ok: [localhost]
> META: ran handlers
> 
> TASK [Validate the network environment files]
> *****************************************************************************
> *****************************************************************************
> ****
> task path:
> /usr/share/openstack-tripleo-validations/validations/network-environment.
> yaml:19
> The full traceback is:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py",
> line 125, in run
>     res = self._execute()
>   File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py",
> line 522, in _execute
>     result = self._handler.run(task_vars=variables)
>   File "/usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py",
> line 45, in run
>     results = merge_hash(results, self._execute_module(tmp=tmp,
> task_vars=task_vars, wrap_async=wrap_async))
>   File
> "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line
> 632, in _execute_module
>     (module_style, shebang, module_data, module_path) =
> self._configure_module(module_name=module_name, module_args=module_args,
> task_vars=task_vars)
>   File
> "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line
> 157, in _configure_module
>     task_vars=task_vars,
> module_compression=self._play_context.module_compression)
>   File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py",
> line 796, in modify_module
>     (b_module_data, module_style, shebang) = _find_module_utils(module_name,
> b_module_data, module_path, module_args, task_vars, module_compression)
>   File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py",
> line 629, in _find_module_utils
>     python_repred_params = repr(json.dumps(params))
>   File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps
>     return _default_encoder.encode(obj)
>   File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
>     chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
>     return _iterencode(o, 0)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1:
> invalid continuation byte
> 
> fatal: [localhost]: FAILED! => {
>     "failed": true, 
>     "msg": "Unexpected failure during module execution.", 
>     "stdout": ""
> }
>  [WARNING]: Could not create retry file
> '/usr/share/openstack-tripleo-validations/validations/network-environment.
> retry'.         [Errno 13] Permission denied: u'/usr/share/openstack-tripleo-
> validations/validations/network-environment.retry'
> 
> 
> PLAY RECAP
> *****************************************************************************
> *****************************************************************************
> ***************************************
> localhost                  : ok=1    changed=0    unreachable=0    failed=1

I tested the validations with other non-default plans (using current tripleo-heat-templates from master as well as variations of the default plan). This worked fine.

This is a problem with the specific plan used here rather than the validation itself: The plan contains a couple of Python bytecode files (.pyc, .pyo) which are not supposed to be there and make the template lookup plugin break.

While a plan should not contain these kinds of files, this is something that could easily be catched within the template lookup plugin. I added a patch to safeguard against this in the future:

  https://review.openstack.org/#/c/504430/

Comment 16 Florian Fuchs 2017-10-04 14:15:30 UTC
Added Pike-backport patch for byte files issue:

  https://review.openstack.org/#/c/509472/

Comment 20 Udi Kalifon 2017-11-15 15:38:28 UTC
The new network-environment validator doesn't cover the specific error that is described in this bug report (it covers a lot of other things).

Comment 21 Florian Fuchs 2017-11-15 16:59:03 UTC
Since the validation in question is already part of the pike/12 branches and covers a lot of other network topics, we might want to move this RFE to 13, as an addition to the network-environment validation.

Comment 22 Beth White 2017-11-15 18:20:52 UTC
The network-environment validation features:
- NIC config schema validation
- Check certain bond/interface combinations
- Network overlaps
- IP pool allocation validation
- Static IP pool/range collisions
- VLAN ID check
- Duplicate IPs

This validator doesn't cover the specific error that is described in this bug report therefore this feature has been marked as verified for the above, however has been cloned for OSP14 to complete the validations that failed qa.

Comment 25 errata-xmlrpc 2017-12-13 20:37:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.