Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1746342

Summary: Openshift-samples clusteroperator still reports avaliable true when set sampleoperator to Removed
Product: OpenShift Container Platform Reporter: XiuJuan Wang <xiuwang>
Component: SamplesAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: low Docs Contact:
Priority: medium    
Version: 4.2.0CC: adam.kaplan, bparees
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:38:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description XiuJuan Wang 2019-08-28 08:38:54 UTC
Description of problem:
Openshift-samples clusteroperator still reports avaliable true when set sampleoperator to Removed

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-27-105356

How reproducible:
always

Steps to Reproduce:
1.Set sampleoperator to Removed.
2.Check if imagestreams and templates the smaples operator managed removed.
3.Check clusteroperator

Actual results:
Openshift-samples clusteroperator still reports avaliable true when samples are removed.
$ oc get is  -l samples.operator.openshift.io/managed=true -n openshift 
No resources found.

$ oc get co openshift-samples -o json  | jq .status
{
  "conditions": [
    {
      "lastTransitionTime": "2019-08-28T08:14:10Z",
      "message": "Samples installation successful at 4.2.0-0.nightly-2019-08-27-105356",
      "status": "True",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2019-08-28T08:25:34Z",
      "message": "Samples installation successful at 4.2.0-0.nightly-2019-08-27-105356",
      "status": "False",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2019-08-28T01:10:36Z",
      "status": "False",
      "type": "Degraded"
    }
  ],



Expected results:
Should report available false.

Additional info:

Comment 1 Gabe Montero 2019-08-28 13:22:38 UTC
Let me pat myself on the back for putting comments in code :-)

This behavior is explicitly intended.

See https://github.com/openshift/cluster-samples-operator/blob/master/pkg/apis/samples/v1/types.go#L348-L353

"
// after online starter upgrade attempts while this operator was not set to managed,
// group arch discussion has decided that we report the Available=true if removed/unmanaged

"

I'll turn to Ben (and have cc:ed Adam) ... given the evolution of upgrade and the ClusterOperator conditions
since the current approach for samples operator was implemented, do we want to pivot here?  Perhaps 
a broader discussion is needed?

Perhaps upgrade is focused more on the degraded condition vs. the failing one?

Comment 2 Ben Parees 2019-08-28 13:34:26 UTC
> Perhaps a broader discussion is needed?

yeah, broader discussion.  Personally i'm still comfortable with the direction we chose(but it doesn't look like the reason/message reflects that the operator is removed/unmanaged?  I would have expected it to say something about that since essentially the reason the operator is "available" is that it the operand removed/unmanaged), but if we are going to consider pivoting, it needs to be done org-wide.  This should not be changed without an agreement across all cluster operator teams about how we are going to handle removed/unmanaged in terms of condition reporting.


> Perhaps upgrade is focused more on the degraded condition vs. the failing one?

i'm not sure what this is in reference to.  Also not sure what the "failing" condition means?  We have available, degraded, and progressing conditions, there is no "failing" condition any more.

Comment 3 Gabe Montero 2019-08-28 13:41:01 UTC
Sorry I meant "Degraded" when I said failing condition.

And to try to clarify my "upgrade is focused more..." comment:
- when we made the code change to report available==true when unmanaged/removed, I believe it was because a CVO operator reporting available==false blocked/failed the upgrade
- I'm wondering (thought I might have heard) that the upgrade ignores the available setting now, and interrogates the degraded one

But I would assume the details on that second point would be included in the broader discussion.

Comment 4 Ben Parees 2019-08-28 13:56:36 UTC
I went through and sorted out what the various conditions will block/cause-to-fail here:

https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions-and-installupgrade


for an upgrade to succeed/complete, the operator must:
1) be available
2) not be degraded
3) not be progressing
4) report itself as being at the new version

so no, it does not ignore the available setting.  but it does also look at degraded.

Comment 5 Gabe Montero 2019-08-28 14:55:56 UTC
OK minimally I'll put this on my to-do for this bug:
- a PR that updates the setting of available/progressing/degraded with a reason/message explaining we are forcing certain values as true/false to enable upgrade
- wrt this bug, available=true, progressing/degraded=false when unmanged/removed
- will stay in sync with Ben re: the broader discussion and either submit the PR noted in the first 2 bullets, or make additional changes based on the discussion, and then craft / submit the PR accordingly.

Comment 6 Gabe Montero 2019-08-29 21:38:43 UTC
OK XiuJuan, for now, the behavior of Available==true when removed/unmanaged is staying, but I've added new reason/message to available/progressing/degraded explaining the operator is unmanaged/removed, per the decision during 4.1 dev that available should stay true so as to not block upgrade.
Ben per above has confirmed that available must be true for upgrade to complete.

If Ben gets a clarification from the broader discussion in time for 4.2 that a change should occur, we'll open a new bug for you to verify.

Comment 8 XiuJuan Wang 2019-09-02 08:46:32 UTC
Wait for a newer payload bump out included the fix.

Comment 10 XiuJuan Wang 2019-09-03 02:47:16 UTC
When samples operator set to Unmanaged|Removed, the clusteropeator will shown reasons for Available,Progressing and Degraded.

    Last Transition Time:  2019-09-03T02:18:11Z
    Message:               Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed
    Reason:                CurrentlyRemoved
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-09-03T02:18:11Z
    Message:               Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed
    Reason:                CurrentlyRemoved
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-09-03T02:18:11Z
    Message:               Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed
    Reason:                CurrentlyRemoved
    Status:                False
    Type:                  Degraded

Test with payload 4.2.0-0.nightly-2019-09-02-172410

Comment 11 errata-xmlrpc 2019-10-16 06:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922