Description of problem: Openshift-samples clusteroperator still reports avaliable true when set sampleoperator to Removed Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-27-105356 How reproducible: always Steps to Reproduce: 1.Set sampleoperator to Removed. 2.Check if imagestreams and templates the smaples operator managed removed. 3.Check clusteroperator Actual results: Openshift-samples clusteroperator still reports avaliable true when samples are removed. $ oc get is -l samples.operator.openshift.io/managed=true -n openshift No resources found. $ oc get co openshift-samples -o json | jq .status { "conditions": [ { "lastTransitionTime": "2019-08-28T08:14:10Z", "message": "Samples installation successful at 4.2.0-0.nightly-2019-08-27-105356", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-08-28T08:25:34Z", "message": "Samples installation successful at 4.2.0-0.nightly-2019-08-27-105356", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-08-28T01:10:36Z", "status": "False", "type": "Degraded" } ], Expected results: Should report available false. Additional info:
Let me pat myself on the back for putting comments in code :-) This behavior is explicitly intended. See https://github.com/openshift/cluster-samples-operator/blob/master/pkg/apis/samples/v1/types.go#L348-L353 " // after online starter upgrade attempts while this operator was not set to managed, // group arch discussion has decided that we report the Available=true if removed/unmanaged " I'll turn to Ben (and have cc:ed Adam) ... given the evolution of upgrade and the ClusterOperator conditions since the current approach for samples operator was implemented, do we want to pivot here? Perhaps a broader discussion is needed? Perhaps upgrade is focused more on the degraded condition vs. the failing one?
> Perhaps a broader discussion is needed? yeah, broader discussion. Personally i'm still comfortable with the direction we chose(but it doesn't look like the reason/message reflects that the operator is removed/unmanaged? I would have expected it to say something about that since essentially the reason the operator is "available" is that it the operand removed/unmanaged), but if we are going to consider pivoting, it needs to be done org-wide. This should not be changed without an agreement across all cluster operator teams about how we are going to handle removed/unmanaged in terms of condition reporting. > Perhaps upgrade is focused more on the degraded condition vs. the failing one? i'm not sure what this is in reference to. Also not sure what the "failing" condition means? We have available, degraded, and progressing conditions, there is no "failing" condition any more.
Sorry I meant "Degraded" when I said failing condition. And to try to clarify my "upgrade is focused more..." comment: - when we made the code change to report available==true when unmanaged/removed, I believe it was because a CVO operator reporting available==false blocked/failed the upgrade - I'm wondering (thought I might have heard) that the upgrade ignores the available setting now, and interrogates the degraded one But I would assume the details on that second point would be included in the broader discussion.
I went through and sorted out what the various conditions will block/cause-to-fail here: https://github.com/openshift/cluster-version-operator/blob/master/docs/dev/clusteroperator.md#conditions-and-installupgrade for an upgrade to succeed/complete, the operator must: 1) be available 2) not be degraded 3) not be progressing 4) report itself as being at the new version so no, it does not ignore the available setting. but it does also look at degraded.
OK minimally I'll put this on my to-do for this bug: - a PR that updates the setting of available/progressing/degraded with a reason/message explaining we are forcing certain values as true/false to enable upgrade - wrt this bug, available=true, progressing/degraded=false when unmanged/removed - will stay in sync with Ben re: the broader discussion and either submit the PR noted in the first 2 bullets, or make additional changes based on the discussion, and then craft / submit the PR accordingly.
OK XiuJuan, for now, the behavior of Available==true when removed/unmanaged is staying, but I've added new reason/message to available/progressing/degraded explaining the operator is unmanaged/removed, per the decision during 4.1 dev that available should stay true so as to not block upgrade. Ben per above has confirmed that available must be true for upgrade to complete. If Ben gets a clarification from the broader discussion in time for 4.2 that a change should occur, we'll open a new bug for you to verify.
Wait for a newer payload bump out included the fix.
When samples operator set to Unmanaged|Removed, the clusteropeator will shown reasons for Available,Progressing and Degraded. Last Transition Time: 2019-09-03T02:18:11Z Message: Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed Reason: CurrentlyRemoved Status: True Type: Available Last Transition Time: 2019-09-03T02:18:11Z Message: Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed Reason: CurrentlyRemoved Status: False Type: Progressing Last Transition Time: 2019-09-03T02:18:11Z Message: Samples installation was previously successful at 4.2.0-0.nightly-2019-09-02-172410 but the samples operator is now Removed Reason: CurrentlyRemoved Status: False Type: Degraded Test with payload 4.2.0-0.nightly-2019-09-02-172410
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922