Hide Forgot
Description of problem: I edited my network-environment.yaml and changed all storage-mgmt ports and networks to noop. I mapped storage-mgmt to storage as well and tried to deploy. The deployment hanged and timed out after 4 hours without giving me a clue as to the cause. It turned out that I still had references to the storage-mgmt vlan and IP in the nic-configs, and the ceph node got a duplicate IP for the ctlplane on 2 different nics. Version-Release number of selected component (if applicable): 8.0 beta How reproducible: 100% Steps to Reproduce: 1. Set some ports and networks to noop, and then still define them on a nic which is not the one that's supposed to be for the ctlplane Actual results: Deployment hangs for 4 hours and fails without a clear error message Expected results: This situation should be detectable. Any config that will result in duplicate IPs should be stopped before the deployment starts, and the user should get an informative error message.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Potential candidate for M-3
Requesting M3Approved, this just needs a +A from cores. Low impact, high need.
Actually it was just merged.
This validator crashes when you run it on a plan other than the default 'overcloud' plan... I can't test any configuration changes :(. It complains about a utf8 error but this is happening also with the default network-environment.yaml. TRIPLEO_PLAN_NAME=default ansible-playbook -vvv -i /usr/bin/tripleo-ansible-inventory /usr/share/openstack-tripleo-validations/validations/network-environment.yaml Using /etc/ansible/ansible.cfg as config file PLAYBOOK: network-environment.yaml ************************************************************************************************************************************************************************* 1 plays in /usr/share/openstack-tripleo-validations/validations/network-environment.yaml PLAY [undercloud] ****************************************************************************************************************************************************************************************** TASK [Gathering Facts] ************************************************************************************************************************************************************************************* Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py <localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0' <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" && echo ansible-tmp-1505286244.94-72836329127152="` echo /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" ) && sleep 0' <localhost> PUT /tmp/tmp0OqJZo TO /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py <localhost> EXEC /bin/sh -c 'chmod u+x /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/ /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py && sleep 0' <localhost> EXEC /bin/sh -c '/usr/bin/python /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py; rm -rf "/home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/" > /dev/null 2>&1 && sleep 0' ok: [localhost] META: ran handlers TASK [Validate the network environment files] ************************************************************************************************************************************************************** task path: /usr/share/openstack-tripleo-validations/validations/network-environment.yaml:19 The full traceback is: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 125, in run res = self._execute() File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 522, in _execute result = self._handler.run(task_vars=variables) File "/usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 45, in run results = merge_hash(results, self._execute_module(tmp=tmp, task_vars=task_vars, wrap_async=wrap_async)) File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 632, in _execute_module (module_style, shebang, module_data, module_path) = self._configure_module(module_name=module_name, module_args=module_args, task_vars=task_vars) File "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line 157, in _configure_module task_vars=task_vars, module_compression=self._play_context.module_compression) File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 796, in modify_module (b_module_data, module_style, shebang) = _find_module_utils(module_name, b_module_data, module_path, module_args, task_vars, module_compression) File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", line 629, in _find_module_utils python_repred_params = repr(json.dumps(params)) File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps return _default_encoder.encode(obj) File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode return _iterencode(o, 0) UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1: invalid continuation byte fatal: [localhost]: FAILED! => { "failed": true, "msg": "Unexpected failure during module execution.", "stdout": "" } [WARNING]: Could not create retry file '/usr/share/openstack-tripleo-validations/validations/network-environment.retry'. [Errno 13] Permission denied: u'/usr/share/openstack-tripleo- validations/validations/network-environment.retry' PLAY RECAP ************************************************************************************************************************************************************************************************* localhost : ok=1 changed=0 unreachable=0 failed=1
(In reply to Udi from comment #13) > This validator crashes when you run it on a plan other than the default > 'overcloud' plan... I can't test any configuration changes :(. It complains > about a utf8 error but this is happening also with the default > network-environment.yaml. > > > TRIPLEO_PLAN_NAME=default ansible-playbook -vvv -i > /usr/bin/tripleo-ansible-inventory > /usr/share/openstack-tripleo-validations/validations/network-environment.yaml > Using /etc/ansible/ansible.cfg as config file > > PLAYBOOK: network-environment.yaml > ***************************************************************************** > ***************************************************************************** > *************** > 1 plays in > /usr/share/openstack-tripleo-validations/validations/network-environment.yaml > > PLAY [undercloud] > ***************************************************************************** > ***************************************************************************** > ******************************** > > TASK [Gathering Facts] > ***************************************************************************** > ***************************************************************************** > *************************** > Using module file > /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py > <localhost> ESTABLISH LOCAL CONNECTION FOR USER: stack > <localhost> EXEC /bin/sh -c 'echo ~ && sleep 0' > <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo > /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" && echo > ansible-tmp-1505286244.94-72836329127152="` echo > /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152 `" ) && > sleep 0' > <localhost> PUT /tmp/tmp0OqJZo TO > /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py > <localhost> EXEC /bin/sh -c 'chmod u+x > /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/ > /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py > && sleep 0' > <localhost> EXEC /bin/sh -c '/usr/bin/python > /home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/setup.py; > rm -rf "/home/stack/.ansible/tmp/ansible-tmp-1505286244.94-72836329127152/" > > /dev/null 2>&1 && sleep 0' > ok: [localhost] > META: ran handlers > > TASK [Validate the network environment files] > ***************************************************************************** > ***************************************************************************** > **** > task path: > /usr/share/openstack-tripleo-validations/validations/network-environment. > yaml:19 > The full traceback is: > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", > line 125, in run > res = self._execute() > File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", > line 522, in _execute > result = self._handler.run(task_vars=variables) > File "/usr/lib/python2.7/site-packages/ansible/plugins/action/normal.py", > line 45, in run > results = merge_hash(results, self._execute_module(tmp=tmp, > task_vars=task_vars, wrap_async=wrap_async)) > File > "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line > 632, in _execute_module > (module_style, shebang, module_data, module_path) = > self._configure_module(module_name=module_name, module_args=module_args, > task_vars=task_vars) > File > "/usr/lib/python2.7/site-packages/ansible/plugins/action/__init__.py", line > 157, in _configure_module > task_vars=task_vars, > module_compression=self._play_context.module_compression) > File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", > line 796, in modify_module > (b_module_data, module_style, shebang) = _find_module_utils(module_name, > b_module_data, module_path, module_args, task_vars, module_compression) > File "/usr/lib/python2.7/site-packages/ansible/executor/module_common.py", > line 629, in _find_module_utils > python_repred_params = repr(json.dumps(params)) > File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps > return _default_encoder.encode(obj) > File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode > return _iterencode(o, 0) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1: > invalid continuation byte > > fatal: [localhost]: FAILED! => { > "failed": true, > "msg": "Unexpected failure during module execution.", > "stdout": "" > } > [WARNING]: Could not create retry file > '/usr/share/openstack-tripleo-validations/validations/network-environment. > retry'. [Errno 13] Permission denied: u'/usr/share/openstack-tripleo- > validations/validations/network-environment.retry' > > > PLAY RECAP > ***************************************************************************** > ***************************************************************************** > *************************************** > localhost : ok=1 changed=0 unreachable=0 failed=1 I tested the validations with other non-default plans (using current tripleo-heat-templates from master as well as variations of the default plan). This worked fine. This is a problem with the specific plan used here rather than the validation itself: The plan contains a couple of Python bytecode files (.pyc, .pyo) which are not supposed to be there and make the template lookup plugin break. While a plan should not contain these kinds of files, this is something that could easily be catched within the template lookup plugin. I added a patch to safeguard against this in the future: https://review.openstack.org/#/c/504430/
Added Pike-backport patch for byte files issue: https://review.openstack.org/#/c/509472/
The new network-environment validator doesn't cover the specific error that is described in this bug report (it covers a lot of other things).
Since the validation in question is already part of the pike/12 branches and covers a lot of other network topics, we might want to move this RFE to 13, as an addition to the network-environment validation.
The network-environment validation features: - NIC config schema validation - Check certain bond/interface combinations - Network overlaps - IP pool allocation validation - Static IP pool/range collisions - VLAN ID check - Duplicate IPs This validator doesn't cover the specific error that is described in this bug report therefore this feature has been marked as verified for the above, however has been cloned for OSP14 to complete the validations that failed qa.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462