Description of problem: The playbook 'control-plane.yml' for provisioning the master nodes in 4.5 and 4.6 UPI fails when the underlying OSP is 16.1. It works fine in OSP 13. Version-Release number of selected component (if applicable): OCP 4.5.0-0.nightly-2020-10-23-050031 OSP RHOS-16.1-RHEL-8-20201021.n.0 The playbooks are being executed from a bastion host with: ansible 2.9.14 python3-openstacksdk-0.36.3 python3-openstackclient-4.0.0 How reproducible: always Steps to Reproduce: 1. Install OSP 16.1 and create a bastion host (it's not a must for reproducing the issue, it can be run from the undercloud as well) 2. Run the provisioning playbooks for UPI as described in [1] ansible-playbook -i "/home/cloud-user/ostest/inventory.yaml" "/home/cloud-user/ostest/control-plane.yaml" Actual results: TASK [Create the Control Plane servers] **************************************** failed: [localhost] (item=[0, 'ostest-vpwdz-master']) => {"ansible_loop_var": "item", "changed": false, "item": [0, "ostest-vpwdz-master"], "module_stderr": "/usr/lib/python3.6/site-packages/openstack/config/cloud_region.py:432: UserWarning: You have a configured API_VERSION with 'latest' in it. In the context of openstacksdk this doesn't make any sense.\n \"You have a configured API_VERSION with 'latest' in\"\n Traceback (most recent call last):\n File \"/home/cloud-user/.ansible/tmp/ansible-tmp-1603265045.369726-22447-253278130374381/AnsiballZ_os_server.py\", line 102, in <module>\n _ansiballz_main()\n File \"/home/cloud-user/.ansible/tmp/ansible-tmp-1603265045.369726-22447-253278130374381/AnsiballZ_os_server.py\", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/home/cloud-user/.ansible/tmp/ansible-tmp-1603265045.369726-22447-253278130374381/AnsiballZ_os_server.py\", line 40, in invoke_module\n runpy.run_module(mod_name='ansible.modules.cloud.openstack.os_server', init_globals=None, run_name='__main__', alter_sys=True)\n File \"/usr/lib64/python3.6/runpy.py\", line 205, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib64/python3.6/runpy.py\", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_name)\n File \"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_os_server_payload_r89pe0c8/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py\", line 759, in <module>\n File \"/tmp/ansible_os_server_payload_r89pe0c8/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py\", line 750, in main\n File \"/tmp/ansible_os_server_payload_r89pe0c8/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py\", line 547, in _create_server\n File \"/tmp/ansible_os_server_payload_r89pe0c8/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py\", line 417, in _exit_hostvars\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/_compute.py\", line 1832, in get_openstack_vars\n return meta.get_hostvars_from_server(self, server)\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/meta.py\", line 499, in get_hostvars_from_server\n expand_server_security_groups(cloud, server)\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/meta.py\", line 471, in expand_server_security_groups\n groups = cloud.list_server_security_groups(server)\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/_compute.py\", line 198, in list_server_security_groups\n server = self.compute.get_server(server)\n File \"/usr/lib/python3.6/site-packages/openstack/compute/v2/_proxy.py\", line 482, in get_server\n return self._get(_server.Server, server)\n File \"/usr/lib/python3.6/site-packages/openstack/proxy.py\", line 46, in check\n return method(self, expected, actual, *args, **kwargs)\n File \"/usr/lib/python3.6/site-packages/openstack/proxy.py\", line 447, in _get\n resource_type=resource_type.__name__, value=value))\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 1321, in fetch\n self._translate_response(response, **kwargs)\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 1134, in _translate_response\n dict.update(self, self.to_dict())\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 969, in to_dict\n value = getattr(self, attr, None)\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 580, in __getattribute__\n return object.__getattribute__(self, name)\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 166, in __get__\n return _convert_type(value, self.type, self.list_type)\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 66, in _convert_type\n ret.append(_convert_type(raw, list_type))\n File \"/usr/lib/python3.6/site-packages/openstack/resource.py\", line 82, in _convert_type\n return data_type(value)\n ValueError: dictionary update sequence element #0 has length 1; 2 is required\n ", "module_stdout": "", "msg": "MODULE FAILURE\n See stdout/stderr for the exact error", "rc": 1} failed: [localhost] (item=[1, 'ostest-vpwdz-master']) => same error failed: [localhost] (item=[2, 'ostest-vpwdz-master']) => same error The tasks fails, but the VMs are deployed successfully. Expected results: no errors Additional info: Commenting out the lines in [2] the task works ok. Tried with ansible 2.8 and 2.9 but same result. Tried with python3-openstacksdk-0.36.3 (from the bastion host [3]) and python3-openstacksdk-0.36.4 (from the undercloud [4]) but no differences. Workaround: add 'ignore_errors: yes' to the 'Create the Control Plane servers' task [1] https://docs.openshift.com/container-platform/4.5/installing/installing_openstack/installing-openstack-user.html [2] https://github.com/openshift/installer/blob/release-4.5/upi/openstack/control-plane.yaml#L85-L86 [3] http://pulp.dist.prod.ext.phx2.redhat.com/content/dist/layered/rhel8/$basearch/openstack-tools/16/os/ [4] http://rhos-qe-mirror-tlv.usersys.redhat.com/rcm-guest/puddles/OpenStack/16.1-RHEL-8/RHOS-16.1-RHEL-8-20201021.n.0/compose/OpenStack/$basearch/os
this seems to be a openstacksdk bug: Here is a similar result https://storyboard.openstack.org/#!/story/2007710 (bug 39843) It is addressed by this patch to openstacksdk https://review.opendev.org/#/c/749381/ which has merged to master as of sep 16 2020 and is included in: Branches master Tags 0.51.0
@Jon Uriarte, could you test with openstacksdk-0.51.0 or later (master). it seems the problem might be fixed there.
This could have been fixed upstream; Can you please verify that the bug still exists now that we recommend[1] `ansible-galaxy` to fetch the dependencies? [1]: https://github.com/openshift/installer/pull/4379
Possibly, we'll need to implement a temporary solution like https://github.com/openshift/installer/pull/4375 until the openstacksdk package containing the fix is more widespread.
I think perhaps https://github.com/openshift/installer/pull/4375 is not so temporary. If we make the version variable, this would probably be something good to have, since this type of defect will probably pop up again if a newer version appears and it brings roblems. Seems pining the playbooks to a particular api version might be a good idea.
Status: Emilien posted a backport of the upstream fix here: https://review.opendev.org/c/openstack/openstacksdk/+/763121/ , which has a +2 +W. Unfortunately it failed to merge due to timeout errors in various tests, all of which seem unrelated to the backport. I have resubmitted and hit the same issue again. I will spend some time trying to improve the timeout situation, or this seems unlikely to ever land.
The openstacksdk backport has now landed.
(In reply to rlobillo from comment #5) > We also tried to run the control-plane.yaml playbook installing the > collection through ansible-galaxy as mentioned in the documentation We have reverted that change; that was my mistake. ansible-galaxy is not supported. If there's any reference to ansible-galaxy left in code or docs, then you can report it as a bug. Thanks!
Checked with python3-openstacksdk-0.36.4-1.20201113235938.el8ost.noarch, and can not reproduce this issue, moved to verified. TASK [Create the Control Plane servers] **************************************** task path: /root/jenkins/workspace/Launch Environment Flexy/private-templates/functionality-testing/aos-4_7/hosts/upi_on_openstack-scripts/04_control-plane.yaml:72 <localhost> ESTABLISH LOCAL CONNECTION FOR USER: root <localhost> EXEC /bin/sh -c 'echo ~root && sleep 0' <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578 `" && echo ansible-tmp-1611570657.568383-4107914-230337991073578="` echo /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578 `" ) && sleep 0' Using module file /usr/lib/python3.6/site-packages/ansible/modules/cloud/openstack/os_server.py <localhost> PUT /root/.ansible/tmp/ansible-local-410775579l6z6kd/tmpzm2x60rx TO /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578/AnsiballZ_os_server.py <localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578/ /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578/AnsiballZ_os_server.py && sleep 0' <localhost> EXEC /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578/AnsiballZ_os_server.py && sleep 0' <localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1611570657.568383-4107914-230337991073578/ > /dev/null 2>&1 && sleep 0' <localhost> EXEC /bin/sh -c 'echo ~root && sleep 0' changed: [localhost] => (item=[0, 'wj47uos125ag-kgxfx-master']) => { "ansible_loop_var": "item", "changed": true, "id": "d4570999-a84a-4470-85da-e8f8f801482b", ...... "server_groups": null, "status": "ACTIVE", "tags": [], "task_state": null, "tenant_id": "542c6ebd48bf40fa857fc245c7572e30", "terminated_at": null, "trusted_image_certificates": null, "updated": "2021-01-25T10:35:47Z", "user_data": null, "user_id": "b414646065ab99780ef1bbcba52c07d2033a6f99fd0b10a3b1b12fcb5e5275e1", "vm_state": "active", "volumes": [] } } META: ran handlers META: ran handlers PLAY RECAP ********************************************************************* localhost : ok=7 changed=5 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633