Bug 2123322 - os-net-config failing on controllers randomly: /sys/class/net/bonding_masters
Summary: os-net-config failing on controllers randomly: /sys/class/net/bonding_masters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 17.1
Assignee: Vijayalakshmi Candappa
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-01 12:01 UTC by Miguel Angel Nieto
Modified: 2023-12-15 04:25 UTC (History)
14 users (show)

Fixed In Version: tripleo-ansible-3.3.1-1.20230518201533.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:12:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 848411 0 None MERGED TripleO os_net_config playbooks should allow re-run 2022-09-13 11:33:46 UTC
OpenStack gerrit 857346 0 None MERGED TripleO os_net_config playbooks should allow re-run 2022-09-15 15:45:37 UTC
Red Hat Issue Tracker NFV-2627 0 None None None 2022-09-01 12:21:56 UTC
Red Hat Issue Tracker OSP-18504 0 None None None 2022-09-01 12:11:08 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:12:31 UTC

Description Miguel Angel Nieto 2022-09-01 12:01:42 UTC
Description of problem:

Sometimes provisioning fails due to this issue in the controllers. I expects /sys/class/net/bonding_masters to be a directory but it is a file.

2022-09-01 11:27:38,155 p=192601 u=stack n=ansible | 2022-09-01 11:27:38.154631 | 52540044-7c68-fc94-f2d5-00000000018e |      FATAL | Run tripleo_os_net_config_module with network_config | controller-0 | error={"ansible_job_id": "778829864973.2281", "changed": false, "cmd": "/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py", "data": "Device br-dpdk0 has generated MAC, skipping.\nDevice vlan173 has generated MAC, skipping.\nDevice br-sriov1 has generated MAC, skipping.\nDevice vlan171 has generated MAC, skipping.\n", "finished": 1, "msg": "Traceback (most recent call last):\n  File \"/tmp/ansible_ansible.legacy.async_wrapper_payload_ksy5zwyy/ansible_ansible.legacy.async_wrapper_payload.zip/ansible/modules/async_wrapper.py\", line 182, in _run_module\n  File \"/tmp/ansible_ansible.legacy.async_wrapper_payload_ksy5zwyy/ansible_ansible.legacy.async_wrapper_payload.zip/ansible/modules/async_wrapper.py\", line 99, in _filter_non_json_lines\nValueError: No start of json char found\n", "results_file": "/root/.ansible_async/778829864973.2281", "started": 1, "stderr": "Traceback (most recent call last):\n  File \"/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py\", line 107, in <module>\n    _ansiballz_main()\n  File \"/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py\", line 99, in _ansiballz_main\n    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n  File \"/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py\", line 47, in invoke_module\n    runpy.run_module(mod_name='ansible.modules.tripleo_os_net_config', init_globals=dict(_module_fqn='ansible.modules.tripleo_os_net_config', _modlib_path=modlib_path),\n  File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code\n    _run_code(code, mod_globals, init_globals,\n  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 236, in <module>\n  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 219, in main\n  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 120, in _apply_safe_defaults\n  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 135, in _generate_default_cfg\nNotADirectoryError: [Errno 20] Not a directory: '/sys/class/net/bonding_masters/addr_assign_type'\n", "stderr_lines": ["Traceback (most recent call last):", "  File \"/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py\", line 107, in <module>", "    _ansiballz_main()", "  File \"/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py\", line 99, in _ansiballz_main", "    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)", "  File \"/home/heat-admin/.ansible/tmp/ansible-tmp-1662031603.4446394-193232-183408918947545/AnsiballZ_tripleo_os_net_config.py\", line 47, in invoke_module", "    runpy.run_module(mod_name='ansible.modules.tripleo_os_net_config', init_globals=dict(_module_fqn='ansible.modules.tripleo_os_net_config', _modlib_path=modlib_path),", "  File \"/usr/lib64/python3.9/runpy.py\", line 210, in run_module", "    return _run_module_code(code, init_globals, run_name, mod_spec)", "  File \"/usr/lib64/python3.9/runpy.py\", line 97, in _run_module_code", "    _run_code(code, mod_globals, init_globals,", "  File \"/usr/lib64/python3.9/runpy.py\", line 87, in _run_code", "    exec(code, run_globals)", "  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 236, in <module>", "  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 219, in main", "  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 120, in _apply_safe_defaults", "  File \"/tmp/ansible_tripleo_os_net_config_payload_sktcllx1/ansible_tripleo_os_net_config_payload.zip/ansible/modules/tripleo_os_net_config.py\", line 135, in _generate_default_cfg", "NotADirectoryError: [Errno 20] Not a directory: '/sys/class/net/bonding_masters/addr_assign_type'"], "stdout": "", "stdout_lines": []}



Version-Release number of selected component (if applicable):
RHOS-17.0-RHEL-9-20220830.n.1


How reproducible:
It happen when provissioning, but it is not failing in every execution. I have 3 controllers and it is failing in only one controller



Actual results:
Provisioning failure


Expected results:
Provisioning executed sucessfully


Additional info:

Comment 1 Vijayalakshmi Candappa 2022-09-01 13:31:00 UTC
This is not an regression, ideally os_net_config is run only once during node provision. 
But, if nic-config have bond interfaces and: either safe_defaults are applied due to error (or) node provision is re-run, this error is seen

Comment 9 Vijayalakshmi Candappa 2023-04-28 05:16:20 UTC
Patch is available in latest compose

Comment 11 Lukas Svaty 2023-06-08 11:05:33 UTC
Raising severity due to AutomationBlocker keyword

Comment 13 Miguel Angel Nieto 2023-06-16 10:26:15 UTC
This bug has not been reproduced any more

Checked with the following version: 
RHOS-17.1-RHEL-9-20230607.n.2
tripleo-ansible-3.3.1-1.20230518201533.el9ost.noarch

Comment 14 Vijayalakshmi Candappa 2023-06-16 10:28:23 UTC
The BZ can be closed - as QE verified it

Comment 21 errata-xmlrpc 2023-08-16 01:12:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577

Comment 22 Red Hat Bugzilla 2023-12-15 04:25:39 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.