Description of problem:
After the 3.7 openshift upgrade, the registry failed to communicate with the pods, leavin build in Error or imagepullbackoff state
After clearing the registry dcs' old env variables became clear which token the registry actually picks up and uses
Version-Release number of selected component (if applicable):
features: Basic-Auth GSSAPI Kerberos SPNEGO
Steps to Reproduce:
builds are failing on the registry because of mis-match between env vars and secrets caused incorrect variables to be picked up by the registry.
the openshift ansible installer for OCP needs to be updated to properly migrate existing registry deployments by doing the following steps (in addition, the ops installer needs to start using the OCP installer). The fix will need to be backported to v3.7+.
1) VERIFY: docker-registry is using the ‘registry’ service account
RUN: oc get dc -n default docker-registry -o json | jq ".spec.template.spec.serviceAccount"
IF NOT, ensure the registry service account exists:
oc create serviceaccount -n default registry
AND set the service account for the docker-registry deployment to the registry service account.
2) VERIFY registry SA can update imagestreams in all namespaces
RUN: oc policy who-can update imagestreams --all-namespaces
VERIFY: system:serviceaccount:default:registry appears in list of allowed users:
# oc policy who-can update imagestreams --all-namespaces | grep system:serviceaccount:default:registry
IF NOT, grant the system:registry role and reverify:
oc create clusterrolebinding system:registry \
3) remove legacy env variables from the registry deploymentconfig if present:
`oc edit -n default dc docker-registry`
Jordan made the sensible suggestion that we should be able to just create/update all these things w/o checking anything:
1) create the registry SA
2) grant it all the right permissions
3) delete the env vars from the DC (the upgrade playbook already edits the DC to update the image tag to the new version).
So hopefully this isn't *that* terrible to implement.
The fix has merged:
This fix will be applied when upgrading to v3.10. The openshift-ansible do not have upgrade scripts for v3.8. Should I try to create it ?
(In reply to Ben Parees from comment #2)
> the openshift ansible installer for OCP needs to be updated to properly
> migrate existing registry deployments by doing the following steps (in
> addition, the ops installer needs to start using the OCP installer).
I see the fix was integrated into the upgrade_control_plane.yml playbook, which the ops installer already calls . So Operations will pick this up as it's backported. Thanks!
could build successfully after upgrading cluster from 3.9 to 3.10, and registry pod has no unnecessary env var
# oc describe po/docker-registry-2-8m52l
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.