Bug 2242605

Summary: During upgrade 16.2-17.1 with not internet on UC overcloud_upgrade_prepare.sh fails pulling registry.access.redhat.com/ubi8/pause
Product: Red Hat OpenStack Reporter: Joaquín Veira <jveiraca>
Component: openstack-tripleo-heat-templatesAssignee: Sergii Golovatiuk <sgolovat>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: medium    
Version: 17.1 (Wallaby)CC: aruffin, bshephar, coldford, dhill, jpretori, kgilliga, lsvaty, mariel, mbollo, mburns, mlaniel, sgolovat, svigan
Target Milestone: z2Keywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20231023220849.7ab5bf8.el9ost Doc Type: Bug Fix
Doc Text:
Before this update, an upgrade from RHOSP 16.2 to 17.1 failed on environments that were not connected to the internet because the `infra_image` value was not defined. The `overcloud_upgrade_prepare.sh` script tried to pull `registry.access.redhat.com/ubi8/pause` instead, which caused an error. The issue is resolved in RHOSP 17.1.2.
Story Points: ---
Clone Of:
: 2259891 (view as bug list) Environment:
Last Closed: 2024-01-16 14:31:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2259891    

Description Joaquín Veira 2023-10-07 11:02:01 UTC
Description of problem:
During upgrade from 16.2 to 17.1 with an undercloud not connected to internet, running overcloud_upgrade_prepare.sh tries to pull registry.access.redhat.com/ubi8/pause as it's defined in /usr/share/containers/containers.conf under infra_image.

$ grep infra_image /usr/share/containers/containers.conf && cat /etc/{redhat,rhosp}-release
# infra_image = "k8s.gcr.io/pause:3.2"
infra_image = "registry.access.redhat.com/ubi8/pause"
Red Hat Enterprise Linux release 8.4 (Ootpa)
Red Hat OpenStack Platform release 16.2.4 (Train)

Comes from containers-common-1.3.1-7.module+el8.4.0+15741+47bb6bfe.x86_64.rpm

RHEL 9 doesn't have anything defined under image_infra.

$ grep infra_image /usr/share/containers/containers.conf && cat /etc/{redhat,rhosp}-release
#infra_image = ""
Red Hat Enterprise Linux release 9.2 (Plow)
Red Hat OpenStack Platform release 17.1.1 (Wallaby)

Comes from containers-common-1-52.el9_2.x86_64.rpm.

Version-Release number of selected component (if applicable):
RHOSP 16.2 (during upgrade to 17.1)

How reproducible:

Try to upgrade from RHOSP 16.2 to 17.1 without connection to our CDN

Actual results:

WARN[0000] failed, retrying in 1s ... (1/3). Error: Error initializing source docker://registry.access.redhat.com/ubi8/pause:latest: error pinging docker registry registry.access.redhat.com: Get "https://registry.access.redhat.com/v2/": read tcp 10.75.74.84:50232->23.50.113.175:443: read: connection reset by peer
WARN[0001] failed, retrying in 1s ... (2/3). Error: Error initializing source docker://registry.access.redhat.com/ubi8/pause:latest: error pinging docker registry registry.access.red
hat.com: Get "https://registry.access.redhat.com/v2/": read tcp 10.75.74.84:37608->23.50.113.184:443: read: connection reset by peer
WARN[0002] failed, retrying in 1s ... (3/3). Error: Error initializing source docker://registry.access.redhat.com/ubi8/pause:latest: error pinging docker registry registry.access.redhat.com: Get "https://registry.access.redhat.com/v2/": read tcp 10.75.74.84:37620->23.50.113.184:443: read: connection reset by peer
ERRO[0003] Error freeing pod lock after failed creation: no such file or directory
Error: error adding Infra Container: error pulling infra-container image: Error initializing source docker://registry.access.redhat.com/ubi8/pause:latest: error pinging docker registry registry.access.redhat.com: Get "https://registry.access.redhat.com/v2/": read tcp 10.75.74.84:37634->23.50.113.184:443: read: connection reset by peer

Expected results:
overcloud_upgrade_prepare.sh completes successfully

Comment 1 Brendan Shephard 2023-10-08 06:49:29 UTC
I reverted the change that added the pause image long ago:
https://github.com/openstack/tripleo-ansible/commit/e6c6295c7470bd3c966fd0ff8fa8c3451bb870d9

Which version of openstack-tripleo-ansible are you running?

Comment 2 Joaquín Veira 2023-10-08 07:54:42 UTC
tripleo-ansible-3.3.1-1.20230521003959.el8ost.noarch
I see is later than the version reflected in the BZ mentioned for that patch

Comment 4 Brendan Shephard 2023-10-08 22:37:45 UTC
I checked the version of the RPM you shared, it does indeed contain my revert. I don't think tripleo has any other task that would add the infra_image outside of that role, so my assumption is either, 1) the infra_image was already defined and the upgraded podman version made it an issue. Or, 2), the podman upgrade is adding the infra_image parameter. 

In any case, simply removing it should solve your immediate problem. 

Addressing it in the long-term, we need to find out where it gets added. I assume there are some environments that haven't been upgraded yet, do they already have the infra_image defined? We might just need to add a task in the upgrade playbooks to check for and ensure this parameter is absent. There's not really any reason (unless a user wanted to use their own pause image for $reasons) to have it defined at all - even in environments that are connected. So it might be fine for us to just automatically remove this param when we find it during the upgrade process.

Adding in the upgrades DFG for comment and collaboration on that topic.

Comment 38 errata-xmlrpc 2024-01-16 14:31:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0209