Bug 1746342 - Openshift-samples clusteroperator still reports avaliable true when set sampleoperator to Removed
Summary: Openshift-samples clusteroperator still reports avaliable true when set sampl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Samples
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.2.0
Assignee: Gabe Montero
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-28 08:38 UTC by XiuJuan Wang
Modified: 2019-10-16 06:38 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:38:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-samples-operator pull 179 0 'None' closed Bug 1746342: clean up available/progressing/degraded, including new msg/reason, wh… 2020-09-17 04:08:13 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:38:24 UTC

Description XiuJuan Wang 2019-08-28 08:38:54 UTC
Description of problem:
Openshift-samples clusteroperator still reports avaliable true when set sampleoperator to Removed

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-27-105356

How reproducible:
always

Steps to Reproduce:
1.Set sampleoperator to Removed.
2.Check if imagestreams and templates the smaples operator managed removed.
3.Check clusteroperator

Actual results:
Openshift-samples clusteroperator still reports avaliable true when samples are removed.
$ oc get is  -l samples.operator.openshift.io/managed=true -n openshift 
No resources found.

$ oc get co openshift-samples -o json  | jq .status
{
  "conditions": [
    {
      "lastTransitionTime": "2019-08-28T08:14:10Z",
      "message": "Samples installation successful at 4.2.0-0.nightly-2019-08-27-105356",
      "status": "True",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2019-08-28T08:25:34Z",
      "message": "Samples installation successful at 4.2.0-0.nightly-2019-08-27-105356",
      "status": "False",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2019-08-28T01:10:36Z",
      "status": "False",
      "type": "Degraded"
    }
  ],



Expected results:
Should report available false.

Additional info:

Comment 1 Gabe Montero 2019-08-28 13:22:38 UTC
Let me pat myself on the back for putting comments in code :-)

This behavior is explicitly intended.

See https://github.com/openshift/cluster-samples-operator/blob/master/pkg/apis/samples/v1/types.go#L348-L353

"
// after online starter upgrade attempts while this operator was not set to managed,
// group arch discussion has decided that we report the Available=true if removed/unmanaged

"

I'll turn to Ben (and have cc:ed Adam) ... given the evolution of upgrade and the ClusterOperator conditions
since the current approach for samples operator was implemented, do we want to pivot here?  Perhaps 
a broader discussion is needed?

Perhaps upgrade is focused more on the degraded condition vs. the failing one?

Comment 2 Ben Parees 2019-08-28 13:34:26 UTC
> Perhaps a broader discussion is needed?

yeah, broader discussion.  Personally i'm still comfortable with the direction we chose(but it doesn't look like the reason/message reflects that the operator is removed/unmanaged?  I would have expected it to say something about that since essentially the reason the operator is "available" is that it the operand removed/unmanaged), but if we are going to consider pivoting, it needs to be done org-wide.  This should not be changed without an agreement across all cluster operator teams about how we are going to handle removed/unmanaged in terms of condition reporting.


> Perhaps upgrade is focused more on the degraded condition vs. the failing one?

i'm not sure what this is in reference to.  Also not sure what the "failing" condition means?  We have available, degraded, and progressing conditions, there is no "failing" condition any more.

Comment 3 Gabe Montero 2019-08-28 13:41:01 UTC
Sorry I meant "Degraded" when I said failing condition.

And to try to clarify my "upgrade is focused more..." comment:
- when we made the code change to report available==true when unmanaged/removed, I believe it was because a CVO operator reporting available==false blocked/failed the upgrade
- I'm wondering (thought I might have heard) that the upgrade ignores the available setting now, and interrogates the degraded one

But I would assume the details on that second point would be included in the broader discussion.

Comment 4 Ben Parees 2019-08-28 13:56:36 UTC
I went through and sorted out what the various conditions will block/cause-to-fail here:

https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions-and-installupgrade


for an upgrade to succeed/complete, the operator must:
1) be available
2) not be degraded
3) not be progressing
4) report itself as being at the new version

so no, it does not ignore the available setting.  but it does also look at degraded.

Comment 5 Gabe Montero 2019-08-28 14:55:56 UTC
OK minimally I'll put this on my to-do for this bug:
- a PR that updates the setting of available/progressing/degraded with a reason/message explaining we are forcing certain values as true/false to enable upgrade
- wrt this bug, available=true, progressing/degraded=false when unmanged/removed
- will stay in sync with Ben re: the broader discussion and either submit the PR noted in the first 2 bullets, or make additional changes based on the discussion, and then craft / submit the PR accordingly.

Comment 6 Gabe Montero 2019-08-29 21:38:43 UTC
OK XiuJuan, for now, the behavior of Available==true when removed/unmanaged is staying, but I've added new reason/message to available/progressing/degraded explaining the operator is unmanaged/removed, per the decision during 4.1 dev that available should stay true so as to not block upgrade.
Ben per above has confirmed that available must be true for upgrade to complete.

If Ben gets a clarification from the broader discussion in time for 4.2 that a change should occur, we'll open a new bug for you to verify.

Comment 8 XiuJuan Wang 2019-09-02 08:46:32 UTC
Wait for a newer payload bump out included the fix.

Comment 10 XiuJuan Wang 2019-09-03 02:47:16 UTC
When samples operator set to Unmanaged|Removed, the clusteropeator will shown reasons for Available,Progressing and Degraded.

    Last Transition Time:  2019-09-03T02:18:11Z
    Message:               Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed
    Reason:                CurrentlyRemoved
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-09-03T02:18:11Z
    Message:               Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed
    Reason:                CurrentlyRemoved
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-09-03T02:18:11Z
    Message:               Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed
    Reason:                CurrentlyRemoved
    Status:                False
    Type:                  Degraded

Test with payload 4.2.0-0.nightly-2019-09-02-172410

Comment 11 errata-xmlrpc 2019-10-16 06:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.