Bug 1768879

Summary: 4.1 to 4.2 upgrades get stuck when insecureRegistries is configured
Product: OpenShift Container Platform Reporter: Jessica Forrester <jforrest>
Component: NodeAssignee: Colin Walters <walters>
Status: CLOSED ERRATA QA Contact: Sunil Choudhary <schoudha>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.zCC: aos-bugs, bbaude, dornelas, jerzhang, jokerman, mgugino, mitr, palshure, pthomas, rphillips, scuppett, umohnani, walters, wking, xtian
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-21 09:17:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913    

Description Jessica Forrester 2019-11-05 13:14:25 UTC
1) Install a 4.1.21 cluster
2) In the config.openshift.io/images Global Config file modify it to set an insecureRegistry. Example spec:

spec:
  registrySources:
    insecureRegistries:
      - quay-enterprise.example.openshift.com

3) Wait for the registry config change to roll out to all of the nodes.

4) Switch cluster to the fast-4.2 channel and upgrade to 4.2.2

The cluster upgrade will get stuck at 88% failing to update the first node in each MachineConfigPool. In the MCP it gets marked Degraded with the message "failed to run pivot: failed to start pivot.service: exit status 1"

From the MCD logs we can see:

Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c2f247fa92936266fc771eb946e0cb84d95279060954fe7e21f5d3f08ec43f3: error loading registries: invalid URL: cannot be empty
2019-11-03T09:39:23.730784095Z W1103 09:39:23.529578   98790 run.go:40] podman failed: exit status 125; retrying...

Looking at the MachineConfig for registries I can see it has already been updated to the new format for 4.2:

unqualified-search-registries %3D ["registry.access.redhat.com"%2C "docker.io"]

[[registry]]
  location %3D "quay-enterprise.example.openshift.com"
  insecure %3D true
  blocked %3D false
  mirror-by-digest-only %3D false
  prefix %3D ""


However the Node has not pivoted into the new version of RHCOS yet, so podman on that node would still be the version we used in 4.1

As I understand from another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1737043 that registries.conf format is incompatible with that version of podman.


I suspect we have the same problem with allowedRegistries and blockedRegistries being configured as well but didn't confirm those.

Comment 1 Miloslav Trmač 2019-11-05 15:01:05 UTC
FWIW:

- ~All versions of podman can support the old V1 format:
  > [registries.insecure]
  > registries = ['...']
- Some versions of podman support the pre-release version of V2 that requires "URL", not "Location"
- podman ≥ 1.4.0 (better is 1.4.1) supports the released version of V2 that requires "Location", not "URL".

- machine-config-operator was generating V1 before 58db21b6e360404c52bf66b9345609bd2c40eca9 , and has been generating V2 since. It never generated the pre-release version of V2.

Comment 2 Brent Baude 2019-11-05 15:52:31 UTC
Not much to add here from the Podman side.  Miloslav is spot on

Comment 16 W. Trevor King 2019-11-12 23:08:17 UTC
Moving back to target 4.1.z, because this bug is already ON_QA.  I dunno what 4.2.z and 4.3.0 parents for a "under these conditions 4.1.z->4.2.z upgrades break, so we need to fix 4.1.z" issue would look like anyway.

Comment 20 errata-xmlrpc 2019-11-21 09:17:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3875

Comment 23 Colin Walters 2020-01-31 19:27:34 UTC
*** Bug 1760484 has been marked as a duplicate of this bug. ***