DescriptionJessica Forrester
2019-11-05 13:14:25 UTC
1) Install a 4.1.21 cluster
2) In the config.openshift.io/images Global Config file modify it to set an insecureRegistry. Example spec:
spec:
registrySources:
insecureRegistries:
- quay-enterprise.example.openshift.com
3) Wait for the registry config change to roll out to all of the nodes.
4) Switch cluster to the fast-4.2 channel and upgrade to 4.2.2
The cluster upgrade will get stuck at 88% failing to update the first node in each MachineConfigPool. In the MCP it gets marked Degraded with the message "failed to run pivot: failed to start pivot.service: exit status 1"
From the MCD logs we can see:
Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c2f247fa92936266fc771eb946e0cb84d95279060954fe7e21f5d3f08ec43f3: error loading registries: invalid URL: cannot be empty
2019-11-03T09:39:23.730784095Z W1103 09:39:23.529578 98790 run.go:40] podman failed: exit status 125; retrying...
Looking at the MachineConfig for registries I can see it has already been updated to the new format for 4.2:
unqualified-search-registries %3D ["registry.access.redhat.com"%2C "docker.io"]
[[registry]]
location %3D "quay-enterprise.example.openshift.com"
insecure %3D true
blocked %3D false
mirror-by-digest-only %3D false
prefix %3D ""
However the Node has not pivoted into the new version of RHCOS yet, so podman on that node would still be the version we used in 4.1
As I understand from another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1737043 that registries.conf format is incompatible with that version of podman.
I suspect we have the same problem with allowedRegistries and blockedRegistries being configured as well but didn't confirm those.
FWIW:
- ~All versions of podman can support the old V1 format:
> [registries.insecure]
> registries = ['...']
- Some versions of podman support the pre-release version of V2 that requires "URL", not "Location"
- podman ≥ 1.4.0 (better is 1.4.1) supports the released version of V2 that requires "Location", not "URL".
- machine-config-operator was generating V1 before 58db21b6e360404c52bf66b9345609bd2c40eca9 , and has been generating V2 since. It never generated the pre-release version of V2.
Moving back to target 4.1.z, because this bug is already ON_QA. I dunno what 4.2.z and 4.3.0 parents for a "under these conditions 4.1.z->4.2.z upgrades break, so we need to fix 4.1.z" issue would look like anyway.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2019:3875
1) Install a 4.1.21 cluster 2) In the config.openshift.io/images Global Config file modify it to set an insecureRegistry. Example spec: spec: registrySources: insecureRegistries: - quay-enterprise.example.openshift.com 3) Wait for the registry config change to roll out to all of the nodes. 4) Switch cluster to the fast-4.2 channel and upgrade to 4.2.2 The cluster upgrade will get stuck at 88% failing to update the first node in each MachineConfigPool. In the MCP it gets marked Degraded with the message "failed to run pivot: failed to start pivot.service: exit status 1" From the MCD logs we can see: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c2f247fa92936266fc771eb946e0cb84d95279060954fe7e21f5d3f08ec43f3: error loading registries: invalid URL: cannot be empty 2019-11-03T09:39:23.730784095Z W1103 09:39:23.529578 98790 run.go:40] podman failed: exit status 125; retrying... Looking at the MachineConfig for registries I can see it has already been updated to the new format for 4.2: unqualified-search-registries %3D ["registry.access.redhat.com"%2C "docker.io"] [[registry]] location %3D "quay-enterprise.example.openshift.com" insecure %3D true blocked %3D false mirror-by-digest-only %3D false prefix %3D "" However the Node has not pivoted into the new version of RHCOS yet, so podman on that node would still be the version we used in 4.1 As I understand from another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1737043 that registries.conf format is incompatible with that version of podman. I suspect we have the same problem with allowedRegistries and blockedRegistries being configured as well but didn't confirm those.