Description of problem: deploying the stack fails in "Deploy network-scripts required for deprecated network service" because the yum repositories are not accessible. I'm in the case where - some yum repositories have been added at start-up (eg through cloud-init or images newly build with dib) - the ctlplane network is not routed Version-Release number of selected component (if applicable): tripleo-ansible-0.7.1-2.20210603175844.el8ost.9.noarch How reproducible: all the time: Steps to Reproduce: 1. use a ctlplane network that is not routed (server does not have access to the yum repository using its IP on the ctlplane network) 2. add some yum repositories in the base image 3. try to deploy Actual results: fails Expected results: it should not fail when network-scripts is already installed Additional info: code looks like this: ~~ - name: Deploy and enable network service become: true when: - (tripleo_bootstrap_legacy_network_packages | length) > 0 block: - name: Deploy network-scripts required for deprecated network service package: name: "{{ tripleo_bootstrap_legacy_network_packages }}" state: present ~~ tripleo_bootstrap_legacy_network_packages is non empty only on fedora and redhat-8. the package: state: present triggers a dnf update even though the network is not configured yet (setting skip_if_unavailable=true in the repos can be done as a workaround, but then it will need to be set again to false after the network is set-up, and this workaround does not work if the repos contain mirrors and these mirrors cannot be resolved). Can it be made so that - either the package task is removed completely: tripleo would assume the package is already part of the base image - either tripleo does an extra effort to not use the package task. eg running something like ~~~ - shell: "rpm -q {{ tripleo_bootstrap_legacy_network_packages[0] }}" failed_when: no changed_when: no register: res - package: when: res.rc != 0 ~~~ so that the package task is not called, in a "best effort" mode.
I tried to reprodece this, but `package` did not fail for me with the node isolated. See details on the reproducer I attempted below. I proposed a patch: https://review.opendev.org/c/openstack/tripleo-ansible/+/827383 @François, can you test the patch in your environment to verify that it solved your issue? Thank you! --------------------------------------------------------------- $ ip route del default $ dnf info network-scripts delorean-openstack-ironic-python-agent-builder-a08dcb4c36ee464ffc0825f770686a4 0.0 B/s | 0 B 00:00 Errors during downloading metadata for repository 'delorean-component-baremetal': - Curl error (6): Couldn't resolve host name for https://trunk.rdoproject.org/centos8/component/baremetal/a0/8d/a08dcb4c36ee46 4ffc0825f770686a47ed86c570_a36069e8/repodata/repomd.xml [Could not resolve host: trunk.rdoproject.org] Error: Failed to download metadata for repo 'delorean-component-baremetal': Cannot download repomd.xml: Cannot download repodata /repomd.xml: All mirrors were tried $ rpm -q network-scripts network-scripts-10.00.15-1.el8.x86_64 Ran this playbook: --- - name: Reproduce RHBZ#2048134 hosts: localhost gather_facts: false vars: tripleo_bootstrap_legacy_network_packages: - network-scripts tasks: - name: Install package become: true package: name: "{{ tripleo_bootstrap_legacy_network_packages }}" state: present $ ansible-playbook reproducer.yaml [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all' PLAY [Reproduce RHBZ#2048134] **************************************************************************************** TASK [Install package] ***************************************************************** ok: [localhost] PLAY RECAP ***************************************************************************** localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
thanks. I will try harder, please don't merge yet :) I am curious about the non-reproducer.. I can reproduce consistently that the package tasks behavior fails (originally on 8-stream, I tried 9-stream and fedora too). By any chance, if you run an "ip route" at the end, is the default route still missing? The only idea I have is there was a lease renewed just between the "dnf info" and the "ansible-playbook" above. Depending on the tasks included and the version of tripleo, there are other ansible package run here and there. -- [cloud-user@stream ~]$ sudo dnf clean all 0 files removed [cloud-user@stream ~]$ sudo ip route del default [cloud-user@stream ~]$ ansible-playbook play.yaml [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all' PLAY [Reproduce RHBZ#2048134] *************************************************************************************************************** TASK [Install package] ********************************************************************************************************************** fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'baseos': Cannot prepare internal mirrorlist: Curl error (7): Couldn't connect to server for https://mirrors.centos.org/metalink?repo=centos-baseos-9-stream&arch=x86_64&protocol=https,http []", "rc": 1, "results": []} PLAY RECAP ********************************************************************************************************************************** localhost : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
thanks for this patch. After further investigation, one reliable way to make it work is to run in the cloud-init: runcmd: - ip route del default - sed -i 's/^skip_if_unavailable.*/skip_if_unavailable=true/' /etc/dnf/dnf.conf To avoid the skip_if_unavailable=true, the patch is working (I modified it to match the installed version of tripleo) for my minimal Centos 8-stream image. For RHEL it is _not_ working because - the "openvswitch2.15" package is present on the system while the one installed is called "openvswitch" and there is some specific logic around this case. I think it works by chance :) the playbook does not fail for openvswitch because of a workaround for ceph deployments. - when the OS::TripleO::Services::Tuned service is defined, tuned/tasks/tuned_install.yml fails due to similar logic. I still don't understand why you werent able to reproduce the issue. I initially thought the skip_if_unavailable was not reliable. Centos comes with a default list of repositories, I thought I added skip_if_unavailable to each of them, but it was not the case, so I thought it was coming from repositories mentioning mirrors.. By making sure the dnf.conf contains skip_if_unavailable=true, the deployment just works. I think it is a good enough workaround and tripleo works for me without any patch.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543