Bug 1791934 - Cluster fails to upgrade if image-registry operator is Unmanaged
Summary: Cluster fails to upgrade if image-registry operator is Unmanaged
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.4.0
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks: 1816656
TreeView+ depends on / blocked
 
Reported: 2020-01-16 18:29 UTC by Adam Kaplan
Modified: 2020-05-04 11:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the image-registry operator didn't report new version if it's unmanaged Consequence: in this case upgrades are blocked Fix: in unmanaged state, always report the actual version Result: upgrades succeed
Clone Of:
: 1816656 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:24:47 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 446 0 None closed Bug 1791934: independent controller for clusteroperators 2021-01-12 12:34:26 UTC
Red Hat Bugzilla 1753778 0 high CLOSED OpenShift fails to upgrade when image-registry operator is unmanaged or removed 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:25:19 UTC

Description Adam Kaplan 2020-01-16 18:29:37 UTC
Description of problem:

Customer had their image-registry operator set to `Unmanaged`. Upgrade from 4.2.13 to 4.2.14 failed to progress because the image registry did not update itself with the new version.


Version-Release number of selected component (if applicable): 4.2.13


How reproducible: Always


Steps to Reproduce:
1. Launch a 4.2.13 cluster
2. Set the image-registry operator to `Unmanaged`
3. Attempt to upgrade the cluster to 4.2.14

Actual results: Upgrade fails to progress. Image registry ClusterOperator reports the following conditions:

```
{
  "conditions": [
    {
      "type": "Available",
      "status": "True",
      "lastTransitionTime": "2020-01-07T08:19:28Z",
      "reason": "Unmanaged",
      "message": "The registry configuration is set to unmanaged mode"
    },
    {
      "type": "Progressing",
      "status": "False",
      "lastTransitionTime": "2020-01-07T08:19:28Z",
      "reason": "Unmanaged",
      "message": "The registry configuration is set to unmanaged mode"
    },
    {
      "type": "Degraded",
      "status": "False",
      "lastTransitionTime": "2019-08-13T08:54:27Z",
      "reason": "Unmanaged",
      "message": "The registry configuration is set to unmanaged mode"
    }
  ],
  "versions": [
    {
      "name": "operator",
      "version": "4.2.13"
    }
  ],
  ...
```


Expected results: Image registry operator reports itself Available and at the correct version OR upgrade is blocked with the `Upgradeable = False` condition.

Additional Info:

Related to bug #1753778, back-ported to 4.2.10.

Comment 1 Adam Kaplan 2020-01-16 19:49:22 UTC
In https://bugzilla.redhat.com/show_bug.cgi?id=1753778#c6 we stated:

> We have no guarantees for Unmanaged when it comes to upgrades,

Perhaps we need to revisit this. Blocking patch upgrades prevents critical CVE fixes from rolling out.

> upgrade is blocked with the `Upgradeable = False` condition.

This only applies to y-stream releases. z-streams upgrades are not blocked by this flag.

Comment 2 W. Trevor King 2020-01-16 19:51:38 UTC
> z-streams upgrades are not blocked by this flag.

All updates are blocked by this today, but [1] is open to make it non-blocking for patch-level updates.  So you don't want to rely on Upgradeable=False blocking patch updates to future-proof vs. that change.

[1]: https://github.com/openshift/cluster-version-operator/pull/291

Comment 3 Oleg Bulatov 2020-01-17 13:15:58 UTC
As for me, everything works as expected. In the Unmanaged state the operator will not roll out CVE fixes, so lying about the deployed version will make it even worse.

Adam, how do you want to deploy CVE fixes when the operator is unmanaged?

Comment 4 W. Trevor King 2020-01-18 00:38:38 UTC
> In the Unmanaged state the operator will not roll out CVE fixes, so lying about the deployed version will make it even worse.

We're asking for you to correctly report the operator version, not lie about the operand version (which you do not even have to monitor when you've been told not to manage anything).

> Adam, how do you want to deploy CVE fixes when the operator is unmanaged?

I think it's up to the cluster admin who set you unmanaged to pick up your operand and apply any CVE-fixing or other patches.  If it's possible to do so, you can continue to monitor the operand without actively managing it, so you can say things like "whoo, boy.  You told me you'd take care of the registry, but I am successfully able to execute all of these exploits" or even "but when I try to do something very basic like pull this well-known image, you 500 me".  But that's all nice-to-have, not something that the an operator that's been told to leave it's operand unmanaged is expected to do.

Comment 5 Oleg Bulatov 2020-01-19 23:52:39 UTC
Today we report the operator version only when the operand is successfully deployed. Is it no longer a case? Can we report the operator version immediately?

Comment 6 W. Trevor King 2020-01-20 02:00:50 UTC
> Today we report the operator version only when the operand is successfully deployed. Is it no longer a case?

That is appropriate when you have an operand.  When you have been configured to leave the registry unmanaged, you no longer have an operand, so there's no reason to delay bumping the operator version.

Comment 9 Wenjing Zheng 2020-02-18 10:38:17 UTC
@Oleg, this bug is about set image registry to be Removed (not Unmanaged), then ugprade,right?

Comment 10 Oleg Bulatov 2020-02-18 13:48:09 UTC
This bug it's about Unmanaged. But it might affect Removed as well.

But we have a regression right now: https://bugzilla.redhat.com/show_bug.cgi?id=1803970

Comment 11 Wenjing Zheng 2020-02-19 03:42:11 UTC
Thanks for your reply, Oleg! I cannot upgrade when set image registry to Unamanged:
zhengwenjings-MacBook-Pro:upgrade wzheng$ oc get clusteroperators
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0-0.nightly-2020-02-17-205936   True        False         False      49m
cloud-credential                           4.3.0-0.nightly-2020-02-17-205936   True        False         False      66m
cluster-autoscaler                         4.3.0-0.nightly-2020-02-17-205936   True        False         False      59m
console                                    4.3.0-0.nightly-2020-02-17-205936   True        False         False      34m
dns                                        4.2.0-0.nightly-2020-02-17-195403   True        False         False      65m
image-registry                             4.2.0-0.nightly-2020-02-17-195403   True        False         False      57m
ingress                                    4.3.0-0.nightly-2020-02-17-205936   True        False         False      37m
insights                                   4.3.0-0.nightly-2020-02-17-205936   True        False         False      65m
kube-apiserver                             4.3.0-0.nightly-2020-02-17-205936   True        False         False      63m
kube-controller-manager                    4.3.0-0.nightly-2020-02-17-205936   True        False         False      63m
kube-scheduler                             4.3.0-0.nightly-2020-02-17-205936   True        False         False      63m
machine-api                                4.3.0-0.nightly-2020-02-17-205936   True        False         False      66m
machine-config                             4.2.0-0.nightly-2020-02-17-195403   True        False         False      65m
marketplace                                4.3.0-0.nightly-2020-02-17-205936   True        False         False      36m
monitoring                                 4.3.0-0.nightly-2020-02-17-205936   True        False         False      34m
network                                    4.2.0-0.nightly-2020-02-17-195403   True        False         False      64m
node-tuning                                4.3.0-0.nightly-2020-02-17-205936   True        False         False      38m
openshift-apiserver                        4.3.0-0.nightly-2020-02-17-205936   True        False         False      62m
openshift-controller-manager               4.3.0-0.nightly-2020-02-17-205936   True        False         False      65m
openshift-samples                          4.3.0-0.nightly-2020-02-17-205936   True        False         False      29m
operator-lifecycle-manager                 4.3.0-0.nightly-2020-02-17-205936   True        False         False      64m
operator-lifecycle-manager-catalog         4.3.0-0.nightly-2020-02-17-205936   True        False         False      64m
operator-lifecycle-manager-packageserver   4.3.0-0.nightly-2020-02-17-205936   True        False         False      37m
service-ca                                 4.3.0-0.nightly-2020-02-17-205936   True        False         False      65m
service-catalog-apiserver                  4.3.0-0.nightly-2020-02-17-205936   True        False         False      62m
service-catalog-controller-manager         4.3.0-0.nightly-2020-02-17-205936   True        False         False      62m
storage                                    4.3.0-0.nightly-2020-02-17-205936   True        False         False      38m
zhengwenjings-MacBook-Pro:upgrade wzheng$ oc describe co image-registry
Name:         image-registry
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2020-02-19T02:35:47Z
  Generation:          1
  Resource Version:    17553
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/image-registry
  UID:                 88e59352-52c0-11ea-b095-0ab71bd22ab4
Spec:
Status:
  Conditions:
    Last Transition Time:  2020-02-19T02:39:23Z
    Message:               The registry configuration is set to unmanaged mode
    Reason:                Unmanaged
    Status:                True
    Type:                  Available
    Last Transition Time:  2020-02-19T02:39:28Z
    Message:               The registry configuration is set to unmanaged mode
    Reason:                Unmanaged
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2020-02-19T02:35:49Z
    Message:               The registry configuration is set to unmanaged mode
    Reason:                Unmanaged
    Status:                False
    Type:                  Degraded
  Extension:               <nil>
  Related Objects:
    Group:     imageregistry.operator.openshift.io
    Name:      cluster
    Resource:  configs
    Group:     
    Name:      openshift-image-registry
    Resource:  namespaces
  Versions:
    Name:     operator
    Version:  4.2.0-0.nightly-2020-02-17-195403
Events:       <none>

Comment 14 W. Trevor King 2020-03-24 17:09:38 UTC
You can't go straight from ASSIGNED to ON_QA.  Need to stop at MODIFIED and let ART's Errata sweeper add you to an errata and sweep you into ON_QA automatically.

Comment 15 Oleg Bulatov 2020-03-24 17:22:48 UTC
Trevor, the fix was merged a month ago and the bug already went through MODIFIED state once. Do I really have to wait this Errata sweeper again?

Comment 16 W. Trevor King 2020-03-24 17:43:52 UTC
ah, yeah, seems like it is still linked to the original Errata.  Sorry for the noise.

Comment 17 Wenjing Zheng 2020-03-25 07:17:06 UTC
Verified on 4.4.0-0.nightly-2020-03-24-225110. It means image registry can be upgrade when it is set to Unmanaged/Removed.

Comment 19 errata-xmlrpc 2020-05-04 11:24:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.