Bug 1301549 - [RFE][UX] validate that there are no duplicate IPs as a result of bad templates in nic-configs
[RFE][UX] validate that there are no duplicate IPs as a result of bad templat...
Status: VERIFIED
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
high Severity high
: Upstream M3
: 12.0 (Pike)
Assigned To: Florian Fuchs
Ola Pavlenko
: FutureFeature, Triaged
Depends On: 1509640
Blocks: 1442136 1467895 1469882 1513624
  Show dependency treegraph
 
Reported: 2016-01-25 06:24 EST by Udi
Modified: 2017-11-15 13:20 EST (History)
14 users (show)

See Also:
Fixed In Version: openstack-tripleo-validations-7.4.1-0.20171007010758.2e43f1a.el7ost
Doc Type: Enhancement
Doc Text:
The update adds a new validation to check the overcloud's network environment. This helps avoid any conflicts with IP addresses, VLANs, and allocation pool when deploying your overcloud.
Story Points: ---
Clone Of:
: 1513624 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 341586 None None None 2017-05-31 06:58 EDT
OpenStack gerrit 504430 None None None 2017-10-04 06:14 EDT
OpenStack gerrit 509472 None None None 2017-10-04 10:15 EDT

  None (edit)
Description Udi 2016-01-25 06:24:08 EST
Description of problem:
I edited my network-environment.yaml and changed all storage-mgmt ports and networks to noop. I mapped storage-mgmt to storage as well and tried to deploy. The deployment hanged and timed out after 4 hours without giving me a clue as to the cause. It turned out that I still had references to the storage-mgmt vlan and IP in the nic-configs, and the ceph node got a duplicate IP for the ctlplane on 2 different nics.


Version-Release number of selected component (if applicable):
8.0 beta


How reproducible:
100%


Steps to Reproduce:
1. Set some ports and networks to noop, and then still define them on a nic which is not the one that's supposed to be for the ctlplane


Actual results:
Deployment hangs for 4 hours and fails without a clear error message


Expected results:
This situation should be detectable. Any config that will result in duplicate IPs should be stopped before the deployment starts, and the user should get an informative error message.
Comment 3 Mike Burns 2016-04-07 17:03:37 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 5 Ola Pavlenko 2016-12-12 09:37:01 EST
Potential candidate for M-3
Comment 10 Jason E. Rist 2017-07-06 18:51:37 EDT
Requesting M3Approved, this just needs a +A from cores.  Low impact, high need.
Comment 11 Jason E. Rist 2017-07-06 19:11:19 EDT
Actually it was just merged.
Comment 13 Udi 2017-09-13 03:15:28 EDT
This validator crashes when you run it on a plan other than the default 'overcloud' plan... I can't test any configuration changes :(. It complains about a utf8 error but this is happening also with the default network-environment.yaml.


TRIPLEO_PLAN_NAME=default ansible-playbook -vvv -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/network-environment.yaml
Using /etc/ansible/ansible.cfg as config file

PLAYBOOK: network-environment.yaml *************************************************************************************************************************************************************************
1 plays in /usr/share/openstack-tripleo-validations/validations/network-environment.yaml

PLAY [undercloud] ******************************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************************
Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack
<localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" && echo ansible-tmp-1505286244.94-72836329127152="` echo /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" ) && sleep 0'
<localhost> PUT /tmp/tmp0OqJZo TO /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py
<localhost> EXEC /bin/sh -c 'chmod u+x /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/ /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py; rm -rf "/home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/" > /dev/null 2>&1 && sleep 0'
ok: [localhost]
META: ran handlers

TASK [Validate the network environment files] **************************************************************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/validations/network-environment.yaml:19
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 125, in run
    res = self._execute()
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 522, in _execute
    result = self._handler.run(task_vars=variables)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 45, in run
    results = merge_hash(results, self._execute_module(tmp=tmp, task_vars=task_vars, wrap_async=wrap_async))
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 632, in _execute_module
    (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 157, in _configure_module
    task_vars=task_vars, module_compression=self._play_context.module_compression)
  File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 796, in modify_module
    (b_module_data, module_style, shebang) = _find_module_utils(module_name, b_module_data, module_path, module_args, task_vars, module_compression)
  File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 629, in _find_module_utils
    python_repred_params = repr(json.dumps(params))
  File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1: invalid continuation byte

fatal: [localhost]: FAILED! => {
    "failed": true, 
    "msg": "Unexpected failure during module execution.", 
    "stdout": ""
}
 [WARNING]: Could not create retry file '/usr/share/openstack-tripleo-validations/validations/network-environment.retry'.         [Errno 13] Permission denied: u'/usr/share/openstack-tripleo-
validations/validations/network-environment.retry'


PLAY RECAP *************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1
Comment 14 Florian Fuchs 2017-09-15 11:22:27 EDT
(In reply to Udi from comment #13)
> This validator crashes when you run it on a plan other than the default
> 'overcloud' plan... I can't test any configuration changes :(. It complains
> about a utf8 error but this is happening also with the default
> network-environment.yaml.
> 
> 
> TRIPLEO_PLAN_NAME=default ansible-playbook -vvv -i
> /usr/bin/tripleo-ansible-inventory
> /usr/share/openstack-tripleo-validations/validations/network-environment.yaml
> Using /etc/ansible/ansible.cfg as config file
> 
> PLAYBOOK: network-environment.yaml
> *****************************************************************************
> *****************************************************************************
> ***************
> 1 plays in
> /usr/share/openstack-tripleo-validations/validations/network-environment.yaml
> 
> PLAY [undercloud]
> *****************************************************************************
> *****************************************************************************
> ********************************
> 
> TASK [Gathering Facts]
> *****************************************************************************
> *****************************************************************************
> ***************************
> Using module file
> /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py
> <localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack
> <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
> <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" && echo
> ansible-tmp-1505286244.94-72836329127152="` echo
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" ) &&
> sleep 0'
> <localhost> PUT /tmp/tmp0OqJZo TO
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py
> <localhost> EXEC /bin/sh -c 'chmod u+x
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py
> && sleep 0'
> <localhost> EXEC /bin/sh -c '/usr/bin/python
> /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py;
> rm -rf "/home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/"
> > /dev/null 2>&1 && sleep 0'
> ok: [localhost]
> META: ran handlers
> 
> TASK [Validate the network environment files]
> *****************************************************************************
> *****************************************************************************
> ****
> task path:
> /usr/share/openstack-tripleo-validations/validations/network-environment.
> yaml:19
> The full traceback is:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py",
> line 125, in run
>     res = self._execute()
>   File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py",
> line 522, in _execute
>     result = self._handler.run(task_vars=variables)
>   File "/usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py",
> line 45, in run
>     results = merge_hash(results, self._execute_module(tmp=tmp,
> task_vars=task_vars, wrap_async=wrap_async))
>   File
> "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line
> 632, in _execute_module
>     (module_style, shebang, module_data, module_path) =
> self._configure_module(module_name=module_name, module_args=module_args,
> task_vars=task_vars)
>   File
> "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line
> 157, in _configure_module
>     task_vars=task_vars,
> module_compression=self._play_context.module_compression)
>   File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py",
> line 796, in modify_module
>     (b_module_data, module_style, shebang) = _find_module_utils(module_name,
> b_module_data, module_path, module_args, task_vars, module_compression)
>   File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py",
> line 629, in _find_module_utils
>     python_repred_params = repr(json.dumps(params))
>   File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps
>     return _default_encoder.encode(obj)
>   File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
>     chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
>     return _iterencode(o, 0)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1:
> invalid continuation byte
> 
> fatal: [localhost]: FAILED! => {
>     "failed": true, 
>     "msg": "Unexpected failure during module execution.", 
>     "stdout": ""
> }
>  [WARNING]: Could not create retry file
> '/usr/share/openstack-tripleo-validations/validations/network-environment.
> retry'.         [Errno 13] Permission denied: u'/usr/share/openstack-tripleo-
> validations/validations/network-environment.retry'
> 
> 
> PLAY RECAP
> *****************************************************************************
> *****************************************************************************
> ***************************************
> localhost                  : ok=1    changed=0    unreachable=0    failed=1

I tested the validations with other non-default plans (using current tripleo-heat-templates from master as well as variations of the default plan). This worked fine.

This is a problem with the specific plan used here rather than the validation itself: The plan contains a couple of Python bytecode files (.pyc, .pyo) which are not supposed to be there and make the template lookup plugin break.

While a plan should not contain these kinds of files, this is something that could easily be catched within the template lookup plugin. I added a patch to safeguard against this in the future:

  https://review.openstack.org/#/c/504430/
Comment 16 Florian Fuchs 2017-10-04 10:15:30 EDT
Added Pike-backport patch for byte files issue:

  https://review.openstack.org/#/c/509472/
Comment 20 Udi 2017-11-15 10:38:28 EST
The new network-environment validator doesn't cover the specific error that is described in this bug report (it covers a lot of other things).
Comment 21 Florian Fuchs 2017-11-15 11:59:03 EST
Since the validation in question is already part of the pike/12 branches and covers a lot of other network topics, we might want to move this RFE to 13, as an addition to the network-environment validation.
Comment 22 Beth Elwell 2017-11-15 13:20:52 EST
The network-environment validation features:
- NIC config schema validation
- Check certain bond/interface combinations
- Network overlaps
- IP pool allocation validation
- Static IP pool/range collisions
- VLAN ID check
- Duplicate IPs

This validator doesn't cover the specific error that is described in this bug report therefore this feature has been marked as verified for the above, however has been cloned for OSP14 to complete the validations that failed qa.

Note You need to log in before you can comment on or make changes to this bug.