Bug 1753778 - OpenShift fails to upgrade when image-registry operator is unmanaged or removed
Summary: OpenShift fails to upgrade when image-registry operator is unmanaged or removed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.3.0
Assignee: Ricardo Maraschini
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks: 1769690 1770658
TreeView+ depends on / blocked
 
Reported: 2019-09-19 20:29 UTC by Paul Gozart
Modified: 2023-09-07 20:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the image registry operator did not report itself as Available and with the correct version if its management state was set to `Removed` Consequence: upgrades failed if the image registry operator was set to `Removed` Fix: the image registry operator reports itself as Available and at the correct version if is set to `Removed` Result: upgrades can complete if the image registry is removed from the cluster
Clone Of:
: 1769690 1769691 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:06:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 387 0 'None' closed Bug 1753778: report Available when registry is explicitly Removed 2021-02-08 05:41:33 UTC
Github openshift cluster-image-registry-operator pull 406 0 'None' closed Bug 1753778: Set Version on Operator status even if Removed. 2021-02-08 05:41:33 UTC
Red Hat Bugzilla 1768357 0 high CLOSED [BM]Installation return failed when default image-registry is Removed 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:06:48 UTC

Internal Links: 1768357 1791934

Description Paul Gozart 2019-09-19 20:29:10 UTC
Description of problem:

TAM customer has external, stand-alone Quay deployed and wants to disable 4.1 internal registry as it is not needed.  They cannot disable the internal registry, however, because it causes the OCP cluster to fail upgrades.  OCP 4 documentation shows 'unmanaged' and 'removed' options for the image registry operator, but when either of those options are used, the cluster fails to upgrade.


Version-Release number of selected component (if applicable):

OpenShift 4.1


How reproducible:

Repeatedly


Steps to Reproduce:
1.  On 4.1, set the ManagementState of the Internal Registry Operator to 'Unmanaged' or 'Removed' 
2.  Try upgrading the 4.1 cluster 
3.  Note the cluster fails to completely upgrade because the 


Actual results:

The 4.1 cluster fails to upgrade


Expected results:

The cluster should be able to upgrade even if the Internal Registry Operator is not used.


Additional info:

Comment 2 Adam Kaplan 2019-09-20 17:28:29 UTC
Noting short-term work-around from Oleg in case others encounter this issue in 4.1:


1) set the management state back to managed
2) set the storage config to emptydir (assuming they don't want to configure real storage for the registry/don't already have a valid storage config)

This will allow the registry operator to achieve the new version of the cluster and report it.

You can then set the registry back to removed (until the next time you want to upgrade, anyway).


(They should *not* allow the registry to continue running w/ emptydir storage however, as it will be a problem if images are pushed to that registry and then it restarts for any reason..the storage will be lost and you'll have metadata in etcd (imagestream objects) that do not match to blobs that exist in the registry storage, which is a real mess).


Also note that not having an internal registry means the imagestreams+templates that the samples operator installs in the openshift namespace are not going to work as they rely on pullthrough via the internal registry in order to handle authentication to registry.redhat.io w/o requiring every user provide their own creds.

Comment 5 XiuJuan Wang 2019-10-18 03:31:30 UTC
Upgrade 4.3.0-0.nightly-2019-10-17-202206 to 4.3.0-0.nightly-2019-10-18-004604.
When set imageregistry to Unmanaged 
image-registry co
    Last Transition Time:  2019-10-18T01:45:26Z
    Message:               The registry configuration is set to unmanaged mode
    Reason:                Unmanaged
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-10-18T01:45:32Z
    Message:               The registry configuration is set to unmanaged mode
    Reason:                Unmanaged
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-10-18T01:43:47Z
    Message:               The registry configuration is set to unmanaged mode
    Reason:                Unmanaged
    Status:                False
    Type:                  Degraded
  Extension:               <nil>
When set to Removed, 
    Last Transition Time:  2019-10-18T02:50:17Z
    Message:               The registry is removed
    Reason:                Removed
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-10-18T03:20:27Z
    Message:               All registry resources are removed
    Reason:                Removed
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-10-18T01:32:38Z
    Status:                False
    Type:                  Degraded
  Extension:               <nil>
These works as designed.
But failed to upgrade both set Unmanaged or Removed.

$ oc get clusterversion   -o json   |  jq -r '.items[].status.conditions[]' 
{
  "lastTransitionTime": "2019-10-18T01:39:57Z",
  "message": "Done applying 4.3.0-0.nightly-2019-10-17-202206",
  "status": "True",
  "type": "Available"
}
{
  "lastTransitionTime": "2019-10-18T03:14:08Z",
  "message": "Cluster operator image-registry is still updating",
  "reason": "ClusterOperatorNotAvailable",
  "status": "True",
  "type": "Failing"
}
{
  "lastTransitionTime": "2019-10-18T02:55:12Z",
  "message": "Unable to apply 4.3.0-0.nightly-2019-10-18-004604: the cluster operator image-registry has not yet successfully rolled out",
  "reason": "ClusterOperatorNotAvailable",
  "status": "True",
  "type": "Progressing"
}
{
  "lastTransitionTime": "2019-10-18T02:54:06Z",
  "status": "True",
  "type": "RetrievedUpdates"
}

$ oc get  co 
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0-0.nightly-2019-10-18-004604   True        False         False      109m
cloud-credential                           4.3.0-0.nightly-2019-10-18-004604   True        False         False      126m
cluster-autoscaler                         4.3.0-0.nightly-2019-10-18-004604   True        False         False      115m
console                                    4.3.0-0.nightly-2019-10-18-004604   True        False         False      112m
dns                                        4.3.0-0.nightly-2019-10-17-202206   True        False         False      125m
image-registry                             4.3.0-0.nightly-2019-10-17-202206   True        False         False      39m
ingress                                    4.3.0-0.nightly-2019-10-18-004604   True        False         False      116m
insights                                   4.3.0-0.nightly-2019-10-18-004604   True        False         False      126m
kube-apiserver                             4.3.0-0.nightly-2019-10-18-004604   True        False         False      123m
kube-controller-manager                    4.3.0-0.nightly-2019-10-18-004604   True        False         False      123m
kube-scheduler                             4.3.0-0.nightly-2019-10-18-004604   True        False         False      123m
machine-api                                4.3.0-0.nightly-2019-10-18-004604   True        False         False      124m
machine-config                             4.3.0-0.nightly-2019-10-17-202206   True        False         False      124m
marketplace                                4.3.0-0.nightly-2019-10-18-004604   True        False         False      27m
monitoring                                 4.3.0-0.nightly-2019-10-18-004604   True        False         False      113m
network                                    4.3.0-0.nightly-2019-10-17-202206   True        False         False      125m
node-tuning                                4.3.0-0.nightly-2019-10-18-004604   True        False         False      28m
openshift-apiserver                        4.3.0-0.nightly-2019-10-18-004604   True        False         False      122m
openshift-controller-manager               4.3.0-0.nightly-2019-10-18-004604   True        False         False      123m
openshift-samples                          4.3.0-0.nightly-2019-10-18-004604   True        False         False      28m
operator-lifecycle-manager                 4.3.0-0.nightly-2019-10-18-004604   True        False         False      125m
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2019-10-18-004604   True        False         False      125m
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2019-10-18-004604   True        False         False      26m
service-ca                                 4.3.0-0.nightly-2019-10-18-004604   True        False         False      126m
service-catalog-apiserver                  4.3.0-0.nightly-2019-10-18-004604   True        False         False      121m
service-catalog-controller-manager         4.3.0-0.nightly-2019-10-18-004604   True        False         False      121m
storage                                    4.3.0-0.nightly-2019-10-18-004604   True        False         False      28m

$oc logs -f cluster-version-operator-6457b678f-txjqb -n openshift-cluster-version   | grep "image-registry"
I1018 03:24:45.501861       1 sync_worker.go:592] Done syncing for deployment "openshift-image-registry/cluster-image-registry-operator" (171 of 448)
I1018 03:24:45.501883       1 sync_worker.go:579] Running sync for clusteroperator "image-registry" (172 of 448)
E1018 03:30:12.297744       1 task.go:77] error running apply for clusteroperator "image-registry" (172 of 448): Cluster operator image-registry is still updating
I1018 03:30:12.297952       1 task_graph.go:611] Result of work: [Cluster operator image-registry is still updating]
I1018 03:30:12.297971       1 sync_worker.go:745] Update error 172 of 448: ClusterOperatorNotAvailable Cluster operator image-registry is still updating (*errors.errorString: cluster operator image-registry is still updating)
E1018 03:30:12.297991       1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Cluster operator image-registry is still updating

Comment 6 Adam Kaplan 2019-11-04 15:23:46 UTC
@Xiujan We have no guarantees for Unamanged when it comes to upgrades, so please do not include this in your test matrix.

Can you please try the test again and attach the data from must-gather if upgrade fails again?

Comment 7 Adam Kaplan 2019-11-05 14:36:37 UTC
Assigning to Ricardo - there's potential that the current upgrade issue overlaps/duplicates https://bugzilla.redhat.com/show_bug.cgi?id=1768357

Comment 8 XiuJuan Wang 2019-11-06 05:27:17 UTC
@Ricardo @Adam 
Upgrade also failed when set image-registry operator to Removed, here is the must-gather log http://virt-openshift-05.lab.eng.nay.redhat.com/xiuwang/1753778/

Comment 11 XiuJuan Wang 2019-11-07 05:13:40 UTC
Could upgrade successfully when set image-registry to removed.
Upgrade from 4.3.0-0.nightly-2019-11-06-184828 to 4.3.0-0.nightly-2019-11-07-010532.

Comment 13 errata-xmlrpc 2020-01-23 11:06:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.