Description of problem: After the 3.7 openshift upgrade, the registry failed to communicate with the pods, leavin build in Error or imagepullbackoff state After clearing the registry dcs' old env variables became clear which token the registry actually picks up and uses Version-Release number of selected component (if applicable): oc version oc v3.7.23 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO openshift v3.7.23 kubernetes v1.7.6+a08f5eeb62 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: builds are failing on the registry because of mis-match between env vars and secrets caused incorrect variables to be picked up by the registry. Expected results: Additional info:
the openshift ansible installer for OCP needs to be updated to properly migrate existing registry deployments by doing the following steps (in addition, the ops installer needs to start using the OCP installer). The fix will need to be backported to v3.7+. 1) VERIFY: docker-registry is using the ‘registry’ service account RUN: oc get dc -n default docker-registry -o json | jq ".spec.template.spec.serviceAccount" IF NOT, ensure the registry service account exists: oc create serviceaccount -n default registry AND set the service account for the docker-registry deployment to the registry service account. 2) VERIFY registry SA can update imagestreams in all namespaces RUN: oc policy who-can update imagestreams --all-namespaces VERIFY: system:serviceaccount:default:registry appears in list of allowed users: # oc policy who-can update imagestreams --all-namespaces | grep system:serviceaccount:default:registry IF NOT, grant the system:registry role and reverify: oc create clusterrolebinding system:registry \ --clusterrole=system:registry \ --serviceaccount=default:registry 3) remove legacy env variables from the registry deploymentconfig if present: `oc edit -n default dc docker-registry` OPENSHIFT_MASTER OPENSHIFT_CA_DATA OPENSHIFT_CERT_DATA OPENSHIFT_INSECURE OPENSHIFT_MASTER KUBERNETES_MASTER OPENSHIFT_CERT_FILE OPENSHIFT_CA_FILE BEARER_TOKEN BEARER_TOKEN_FILE OPENSHIFT_KEY_FILE OPENSHIFT_KEY_DATA (from https://github.com/openshift/origin/blob/master/pkg/client/cmd/clientcmd.go#L161-L202)
Jordan made the sensible suggestion that we should be able to just create/update all these things w/o checking anything: 1) create the registry SA 2) grant it all the right permissions 3) delete the env vars from the DC (the upgrade playbook already edits the DC to update the image tag to the new version). So hopefully this isn't *that* terrible to implement.
The fix has merged: https://github.com/openshift/openshift-ansible/pull/8020
This fix will be applied when upgrading to v3.10. The openshift-ansible do not have upgrade scripts for v3.8. Should I try to create it ?
(In reply to Ben Parees from comment #2) > the openshift ansible installer for OCP needs to be updated to properly > migrate existing registry deployments by doing the following steps (in > addition, the ops installer needs to start using the OCP installer). I see the fix was integrated into the upgrade_control_plane.yml playbook, which the ops installer already calls [1]. So Operations will pick this up as it's backported. Thanks! [1] https://github.com/openshift/openshift-ansible-ops/blob/prod/playbooks/release/bin/cicd_operations.sh#L356
Verified openshift v3.10.0-0.50.0 kubernetes v1.10.0+b81c8f8 could build successfully after upgrading cluster from 3.9 to 3.10, and registry pod has no unnecessary env var # oc describe po/docker-registry-2-8m52l Environment: REGISTRY_HTTP_ADDR: :5000 REGISTRY_HTTP_NET: tcp REGISTRY_HTTP_SECRET: IFu5DeOwZxq5jQ75kjYqYKZhD4kXiZaK+UZ1poEAa+o= REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA: false REGISTRY_HTTP_TLS_KEY: /etc/secrets/registry.key REGISTRY_OPENSHIFT_SERVER_ADDR: docker-registry.default.svc:5000 REGISTRY_CONFIGURATION_PATH: /etc/registry/config.yml REGISTRY_HTTP_TLS_CERTIFICATE: /etc/secrets/registry.crt
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816