Description of problem: TAM customer has external, stand-alone Quay deployed and wants to disable 4.1 internal registry as it is not needed. They cannot disable the internal registry, however, because it causes the OCP cluster to fail upgrades. OCP 4 documentation shows 'unmanaged' and 'removed' options for the image registry operator, but when either of those options are used, the cluster fails to upgrade. Version-Release number of selected component (if applicable): OpenShift 4.1 How reproducible: Repeatedly Steps to Reproduce: 1. On 4.1, set the ManagementState of the Internal Registry Operator to 'Unmanaged' or 'Removed' 2. Try upgrading the 4.1 cluster 3. Note the cluster fails to completely upgrade because the Actual results: The 4.1 cluster fails to upgrade Expected results: The cluster should be able to upgrade even if the Internal Registry Operator is not used. Additional info:
Noting short-term work-around from Oleg in case others encounter this issue in 4.1: 1) set the management state back to managed 2) set the storage config to emptydir (assuming they don't want to configure real storage for the registry/don't already have a valid storage config) This will allow the registry operator to achieve the new version of the cluster and report it. You can then set the registry back to removed (until the next time you want to upgrade, anyway). (They should *not* allow the registry to continue running w/ emptydir storage however, as it will be a problem if images are pushed to that registry and then it restarts for any reason..the storage will be lost and you'll have metadata in etcd (imagestream objects) that do not match to blobs that exist in the registry storage, which is a real mess). Also note that not having an internal registry means the imagestreams+templates that the samples operator installs in the openshift namespace are not going to work as they rely on pullthrough via the internal registry in order to handle authentication to registry.redhat.io w/o requiring every user provide their own creds.
Upgrade 4.3.0-0.nightly-2019-10-17-202206 to 4.3.0-0.nightly-2019-10-18-004604. When set imageregistry to Unmanaged image-registry co Last Transition Time: 2019-10-18T01:45:26Z Message: The registry configuration is set to unmanaged mode Reason: Unmanaged Status: True Type: Available Last Transition Time: 2019-10-18T01:45:32Z Message: The registry configuration is set to unmanaged mode Reason: Unmanaged Status: False Type: Progressing Last Transition Time: 2019-10-18T01:43:47Z Message: The registry configuration is set to unmanaged mode Reason: Unmanaged Status: False Type: Degraded Extension: <nil> When set to Removed, Last Transition Time: 2019-10-18T02:50:17Z Message: The registry is removed Reason: Removed Status: True Type: Available Last Transition Time: 2019-10-18T03:20:27Z Message: All registry resources are removed Reason: Removed Status: False Type: Progressing Last Transition Time: 2019-10-18T01:32:38Z Status: False Type: Degraded Extension: <nil> These works as designed. But failed to upgrade both set Unmanaged or Removed. $ oc get clusterversion -o json | jq -r '.items[].status.conditions[]' { "lastTransitionTime": "2019-10-18T01:39:57Z", "message": "Done applying 4.3.0-0.nightly-2019-10-17-202206", "status": "True", "type": "Available" } { "lastTransitionTime": "2019-10-18T03:14:08Z", "message": "Cluster operator image-registry is still updating", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Failing" } { "lastTransitionTime": "2019-10-18T02:55:12Z", "message": "Unable to apply 4.3.0-0.nightly-2019-10-18-004604: the cluster operator image-registry has not yet successfully rolled out", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Progressing" } { "lastTransitionTime": "2019-10-18T02:54:06Z", "status": "True", "type": "RetrievedUpdates" } $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0-0.nightly-2019-10-18-004604 True False False 109m cloud-credential 4.3.0-0.nightly-2019-10-18-004604 True False False 126m cluster-autoscaler 4.3.0-0.nightly-2019-10-18-004604 True False False 115m console 4.3.0-0.nightly-2019-10-18-004604 True False False 112m dns 4.3.0-0.nightly-2019-10-17-202206 True False False 125m image-registry 4.3.0-0.nightly-2019-10-17-202206 True False False 39m ingress 4.3.0-0.nightly-2019-10-18-004604 True False False 116m insights 4.3.0-0.nightly-2019-10-18-004604 True False False 126m kube-apiserver 4.3.0-0.nightly-2019-10-18-004604 True False False 123m kube-controller-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 123m kube-scheduler 4.3.0-0.nightly-2019-10-18-004604 True False False 123m machine-api 4.3.0-0.nightly-2019-10-18-004604 True False False 124m machine-config 4.3.0-0.nightly-2019-10-17-202206 True False False 124m marketplace 4.3.0-0.nightly-2019-10-18-004604 True False False 27m monitoring 4.3.0-0.nightly-2019-10-18-004604 True False False 113m network 4.3.0-0.nightly-2019-10-17-202206 True False False 125m node-tuning 4.3.0-0.nightly-2019-10-18-004604 True False False 28m openshift-apiserver 4.3.0-0.nightly-2019-10-18-004604 True False False 122m openshift-controller-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 123m openshift-samples 4.3.0-0.nightly-2019-10-18-004604 True False False 28m operator-lifecycle-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 125m operator-lifecycle-manager-catalog 4.3.0-0.nightly-2019-10-18-004604 True False False 125m operator-lifecycle-manager-packageserver 4.3.0-0.nightly-2019-10-18-004604 True False False 26m service-ca 4.3.0-0.nightly-2019-10-18-004604 True False False 126m service-catalog-apiserver 4.3.0-0.nightly-2019-10-18-004604 True False False 121m service-catalog-controller-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 121m storage 4.3.0-0.nightly-2019-10-18-004604 True False False 28m $oc logs -f cluster-version-operator-6457b678f-txjqb -n openshift-cluster-version | grep "image-registry" I1018 03:24:45.501861 1 sync_worker.go:592] Done syncing for deployment "openshift-image-registry/cluster-image-registry-operator" (171 of 448) I1018 03:24:45.501883 1 sync_worker.go:579] Running sync for clusteroperator "image-registry" (172 of 448) E1018 03:30:12.297744 1 task.go:77] error running apply for clusteroperator "image-registry" (172 of 448): Cluster operator image-registry is still updating I1018 03:30:12.297952 1 task_graph.go:611] Result of work: [Cluster operator image-registry is still updating] I1018 03:30:12.297971 1 sync_worker.go:745] Update error 172 of 448: ClusterOperatorNotAvailable Cluster operator image-registry is still updating (*errors.errorString: cluster operator image-registry is still updating) E1018 03:30:12.297991 1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Cluster operator image-registry is still updating
@Xiujan We have no guarantees for Unamanged when it comes to upgrades, so please do not include this in your test matrix. Can you please try the test again and attach the data from must-gather if upgrade fails again?
Assigning to Ricardo - there's potential that the current upgrade issue overlaps/duplicates https://bugzilla.redhat.com/show_bug.cgi?id=1768357
@Ricardo @Adam Upgrade also failed when set image-registry operator to Removed, here is the must-gather log http://virt-openshift-05.lab.eng.nay.redhat.com/xiuwang/1753778/
Could upgrade successfully when set image-registry to removed. Upgrade from 4.3.0-0.nightly-2019-11-06-184828 to 4.3.0-0.nightly-2019-11-07-010532.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062