Description of problem: When setting openshiftcontrollermanagers/cluster to unmanaged, Processing and Degraded status should not change, still be False Version-Release number of selected component (if applicable): 4.1.0-0.nightly-2019-06-06-122735 How reproducible: always Steps to Reproduce: 1.Update openshiftcontrollermanager/cluster to unmanaged managementState: Unmanaged 2.Check the openshift-controller-manager cluster [wewang@Desktop ~]$ oc get clusteroperator |grep openshift-controller-manager NAME AVAILABLE PROGRESSING DEGRADED openshift-controller-manager Unknown Unknown Unknown 92s 3. Actual results: PROGRESSING and DEGRADED are changed to Unknown status Expected results: PROGRESSING and DEGRADED should keep False status Additional info: older bug: https://bugzilla.redhat.com/show_bug.cgi?id=1667376 https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/operator.go#L92 https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/operator.go#L98
And how about upgradable status? I saw code did not deal with it when setting operator to unmanaged. refer to: https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/operator.go#L81
@wewang per recent discussions Upgradeable should be `True` if the operator is Unmanaged or Removed. We should not block upgrades for Unmanaged - however customers will risk the upgrade failing.
So this is a weird one. The operator code is trying to set when unmanaged: - available to unknown - progressing to false - degraded to false if you look at https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/operator.go#L84-L101 I've added debug that confirms those are the setting prior to the UpdateStatus call at https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/operator.go#L104 But I then see this event coming across immediately after the UpdateStatus code, where in fact progressing and degraded are also Unknown (which is also what I see when running oc get clusteroperator openshift-controller-manager -o yaml ) : I0822 19:20:23.116996 1 event.go:209] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"7b9429af-c4f4-11e9-9995-0201d59f5a8e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator openshift-controller-manager changed: {"conditions":[{"type":"Available","status":"Unknown","lastTransitionTime":"2019-08-22T19:20: A: 07Z","reason":"Unmanaged"},{"type":"Progressing","status":"Unknown","lastTransitionTime":"2019-08-22T19:20:07Z","reason":"Unmanaged"},{"type":"Degraded","status":"Unknown","lastTransitionTime":"2019-08-22T19:20:07Z","reason":"Unmanaged"},{"type":"Upgradeable","status":"Unknown","lastTransitionTime":"2019-08-22T19:20:07Z","reason":"Unmanaged"}],"extension":null} B: 23Z","reason":"Unmanaged"},{"type":"Progressing","status":"Unknown","lastTransitionTime":"2019-08-22T19:20:23Z","reason":"Unmanaged"},{"type":"Degraded","status":"Unknown","lastTransitionTime":"2019-08-22T19:20:23Z","reason":"Unmanaged"},{"type":"Upgradeable","status":"Unknown","lastTransitionTime":"2019-08-22T19:20:23Z","reason":"Unmanaged"}],"extension":null} That would imply the UpdateStatus call at https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/pkg/operator/operator.go#L104 is misbehaving somehow. Some vendoring incompatibility / mismatch perhaps ? Continuing to debug, but posting this in case folks on the CC: have seen this before.
Ah .... some confusion in my last comment .... those lines I noted update the openshiftcontrollermanager type, which happens to inline OperatorStatus ... the oc get openshiftcontrollermanager -o yaml does in fact reflect unmanaged resulting in available unknown, progressing/degraded false in that object. still need to find the code that updates what I see with oc get clusteroperator openshift-controller-manager -o yaml
Believe I found it. It is this code in the common openshift/library-go package: https://github.com/openshift/library-go/blob/master/pkg/operator/status/status_controller.go#L127-L143 The fact that it is being done by this common code path potentially implies that other controllers are setting all the conditions to unknown when unmanaged, at least with management.IsOperatorAlwaysManaged() is false. Also see https://github.com/openshift/library-go/blob/master/pkg/operator/management/management_state.go#L36-L39 and https://github.com/openshift/library-go/blob/master/pkg/operator/management/management_state.go#L22-L27 // SetOperatorAlwaysManaged is one time choice when an operator want to opt-out from supporting the "unmanaged" state. // This is a case of control plane operators or operators that are required to always run otherwise the cluster will // get into unstable state or critical components will stop working. So in looks like the implication is that OCM does NOT support going unmanaged. Though I'm still trying to reconcile all the booleans and nots between the management_state.go code and https://github.com/openshift/library-go/blob/master/pkg/operator/status/status_controller.go#L127
Yep so Michal's https://github.com/openshift/library-go/pull/297 is what dropped this stuff in back in March. The description of that PR at least says UPDATED: management.SetOperatorAlwaysManaged() and management.SetOperatorNotRemovable() will make the operator ignore the "unmanaged" and "removed" state, so you can opt-in. But the comment in the code says SetOperatorAlwaysManaged is one time choice when an operator want to opt-out from supporting the "unmanaged" state. // This is a case of control plane operators or operators that are required to always run otherwise the cluster will // get into unstable state or critical components will stop working. If per the comment in management_state.go: IsOperatorAlwaysManaged means the operator can't be set to unmanaged state. then that would imply that the line which results in setting the cluster operator conditions to unknown if detailedSpec.ManagementState == operatorv1.Unmanaged && !management.IsOperatorAlwaysManaged() { meaning that we've set management state to unmanaged and that we are *NOT* always managed ... meanging we allow unmanaged should be if detailedSpec.ManagementState == operatorv1.Unmanaged && management.IsOperatorAlwaysManaged() { meaning that we've set management state to unmanaged and we always must be managed ... can't be set to unmanaged, as all the logic and comments in management_state.go suggest. When I made that change in status_controller.go, switching to if detailedSpec.ManagementState == operatorv1.Unmanaged && management.IsOperatorAlwaysManaged() { I no longer see the clusteroperator conditions for openshift-controller-manager set to unknown Now, what about the controller that do NOT want to be unmanaged ... right now, assuming they are not calling SetOperatorAlwaysManaged(), are getting the unmanaged protection by accident so we have a chicken/egg situation between updating all those operators to call SetOperatorAlwaysManaged() as we update openshift/library-go to fix https://github.com/openshift/library-go/blob/master/pkg/operator/status/status_controller.go#L127 Or we leave the behavior as is, and simply add comments to clarify things, and say the OCM controller, we call SetOperatorAlwaysManaged(), so that we can set to unmanaged. I left https://github.com/openshift/library-go/blob/master/pkg/operator/status/status_controller.go#L127 and added that call to SetOperatorAlwaysManaged() in the OCM operator and that worked too. Before crafting any openshift/library or openshift/cluster-openshift-controller-manager-operator, I'd like to see if Michal has the cycles digest this comment and #comment 5 and what his take on which way we should go wrt the chicken/egg scenario I mentioned here, my interpretation of names/comments/logic, etc.
Opened https://github.com/openshift/library-go/pull/507 with one of the proposed solutions. We can continue the which way to go discussion there. Removing needinfo
Reminder - even if the library-go PR merges, we have to bump library-go in the cluster-openshift-controller-manager-operator repo to pick up the fix
feedback provided in https://github.com/openshift/library-go/pull/507 we are going with the other approach
We are still discussing internally, but the belief / understanding is growing on our end that unmanaged/removed should not be supported for ocm-o. In that sense, unknown on the clusteroperator conditions would be acceptable. That said, there is some cleanup on the ocm-o and openshiftcontrollermanager object sides that we are entertaining.
OK, where we landed on this: 1) ocm-o declares to library-go that it only supports managed, and NOT unmanaged or removed, management state 2) if you set ocm-o to unmanaged or removed, a reason/message is posted on all the conditions that this is NOT supported, but otherwise all conditions are left unchanged 3) with unmanaged/removed being IGNORED, the rest of the sync loop is still processed (vs. the abort that used to occur for unmanaged) ... 4) ... if the rest of the sync loop will make any additional changes to the conditions based on the other inputs to the system, and if errors are detected, the reason/message for the condition(s) in question will be updated accordingly, and overwrite the "unmanaged not supported" messages if present 5) if you switch an operator from unmanaged/removed back to managed, the reason/message around "unmanged/removed not supported" are removed from the conditions.
tested in version: 4.2.0-0.nightly-2019-08-26-163938 1.Set ocm-o to Unmanaged, will not affect status of Available, Degraded and Progressing $oc get co openshift-controller-manager -o yaml conditions: - lastTransitionTime: "2019-08-27T01:05:55Z" message: |- ConfigObservationDegraded: the controller manager spec was set to Unmanaged state, but that is unsupported, and has no effect on this condition WorkloadDegraded: the controller manager spec was set to Unmanaged state, but that is unsupported, and has no effect on this condition reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2019-08-27T08:15:27Z" message: 'Progressing: the controller manager spec was set to Unmanaged state, but that is unsupported, and has no effect on this condition' reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2019-08-27T01:07:16Z" message: 'Available: the controller manager spec was set to Unmanaged state, but that is unsupported, and has no effect on this condition' reason: AsExpected status: "True" type: Available - lastTransitionTime: "2019-08-27T01:05:55Z" reason: NoData status: Unknown type: Upgradeable $oc get co |grep openshift-controller-manager openshift-controller-manager 4.2.0-0.nightly-2019-08-26-163938 True False False 7h17m
Going to live with this duplication @Wen There are other cases where the messages there could vary, and filtering on this specific message feels kludgy It is on par with us duplicating the message across available/progressing/etc.... we are making sure the situation is reported
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922