Bug 1954108

Summary: Curl error (6): Couldn't resolve host name for http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm
Product: Red Hat OpenStack Reporter: wes hayutin <whayutin>
Component: tripleo-ansibleAssignee: wes hayutin <whayutin>
Status: CLOSED CURRENTRELEASE QA Contact: Joe H. Rahme <jhakimra>
Severity: urgent Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: aschultz, jschluet, sandyada
Target Milestone: ---Keywords: Tracking, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-15.2.0-0.20210513080802.8ed7631.el8osttrunk Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-28 12:47:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description wes hayutin 2021-04-27 15:31:15 UTC
Description of problem:

  File "/usr/lib/python3.6/site-packages/dnf/util.py", line 97, in _urlopen_progress
    libdnf.repo.PackageTarget.downloadPackages(libdnf.repo.VectorPPackageTarget(targets), True)
RuntimeError: Curl error (6): Couldn't resolve host name for http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm [Could not resolve host: download.devel.redhat.com]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/dnf/cli/main.py", line 122, in cli_run
    cli.run()
  File "/usr/lib/python3.6/site-packages/dnf/cli/cli.py", line 1067, in run
    return self.command.run()
  File "/usr/lib/python3.6/site-packages/dnf/cli/commands/install.py", line 105, in run
    err_pkgs = self._install_files()
  File "/usr/lib/python3.6/site-packages/dnf/cli/commands/install.py", line 143, in _install_files
    progress=self.base.output.progress):
  File "/usr/lib/python3.6/site-packages/dnf/base.py", line 1245, in add_remote_rpms
    path = dnf.util._urlopen_progress(path, self.conf, progress)
  File "/usr/lib/python3.6/site-packages/dnf/util.py", line 100, in _urlopen_progress
    raise IOError(str(e))
OSError: Curl error (6): Couldn't resolve host name for http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm [Could not resolve host: download.devel.redhat.com]
2021-04-27T05:28:18-0400 CRITICAL Curl error (6): Couldn't resolve host name for http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm [Could not resolve host: download.devel.redhat.com]


https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-bm_envA-3ctlr_1comp-featureset035-rhos-17/4ff4377/logs/overcloud-controller-0/var/log/dnf.log



2021-04-27 10:36:42 | 2021-04-27 10:36:42.179536 | 000c523b-c9aa-2e5d-582e-000000000227 |       TASK | Deploy release version package
2021-04-27 10:36:44 | 2021-04-27 10:36:44.523902 | 000c523b-c9aa-2e5d-582e-000000000227 |      FATAL | Deploy release version package | overcloud-controller-0 | error={"changed": false, "failures": ["No package rhosp-release available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}
2021-04-27 10:36:44 | 2021-04-27 10:36:44.528251 | 000c523b-c9aa-2e5d-582e-000000000227 |     TIMING | tripleo_bootstrap : Deploy release version package | overcloud-controller-0 | 0:00:12.993597 | 2.32s
2021-04-27 10:36:44 | 2021-04-27 10:36:44.531190 | 000c523b-c9aa-2e5d-582e-000000000227 |      FATAL | Deploy release version package | overcloud-novacompute-0 | error={"changed": false, "failures": ["No package rhosp-release available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}
2021-04-27 10:36:44 | 2021-04-27 10:36:44.532925 | 000c523b-c9aa-2e5d-582e-000000000227 |     TIMING | tripleo_bootstrap : Deploy release version package | overcloud-novacompute-0 | 0:00:12.998300 | 2.35s
2021-04-27 10:36:44 | 2021-04-27 10:36:44.544547 | 000c523b-c9aa-2e5d-582e-000000000227 |      FATAL | Deploy release version package | overcloud-controller-1 | error={"changed": false, "failures": ["No package rhosp-release available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}
2021-04-27 10:36:44 | 2021-04-27 10:36:44.546827 | 000c523b-c9aa-2e5d-582e-000000000227 |     TIMING | tripleo_bootstrap : Deploy release version package | overcloud-controller-1 | 0:00:13.012197 | 2.31s
2021-04-27 10:36:44 | 2021-04-27 10:36:44.549530 | 000c523b-c9aa-2e5d-582e-000000000227 |      FATAL | Deploy release version package | overcloud-controller-2 | error={"changed": false, "failures": ["No package rhosp-release available."], "msg": "Failed to install some of the specified packages", "rc": 1, "results": []}
2021-04-27 10:36:44 | 2021-04-27 10:36:44.551637 | 000c523b-c9aa-2e5d-582e-000000000227 |     TIMING | tripleo_bootstrap : Deploy release version package | overcloud-controller-2 | 0:00:13.017013 | 2.29s
2021-04-27 10:36:44 | 
2021-04-27 10:36:44 | NO MORE HOSTS LEFT *************************************************************


https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-bm_envA-3ctlr_1comp-featureset035-rhos-17/4ff4377/logs/undercloud/home/zuul/overcloud_deploy.log

Comment 1 wes hayutin 2021-04-27 15:33:25 UTC
The overcloud networks have not fully come up by the time bootstrap runs...

from the undercloud after the deploy fails.. the network is working fine

$ curl -o rhos-release-latest.noarch.rpm http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current  Dload  Upload   Total   Spent    Left  Speed
 100 36248  100 36248    0     0   258k      0 --:--:-- --:--:-- --:--:--  258k


Suggest we add retries here:
https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_bootstrap/tasks/main.yml#L36-L43

Comment 2 wes hayutin 2021-04-27 15:50:24 UTC
Potential fix:
https://review.opendev.org/c/openstack/tripleo-ansible/+/788318

Comment 3 Alex Schultz 2021-04-27 16:21:30 UTC
This is more of an infra problem than code. We can improve reliability in tripleo-ansible but the RCA would be infra problems.

Comment 4 Sandeep Yadav 2021-05-04 13:08:09 UTC
In the 17 line, as we don't have repos on overcloud task[1] fails, It is expected that overcloud node contains all required things with image build.

We have proposed[2] as a fix. 
[1] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_bootstrap/tasks/main.yml#L39
[2] https://review.opendev.org/c/openstack/tripleo-common/+/789563