Cause: The Marketplace operator was written to report that the services it offered were degraded whenever the pod exited gracefully. This would happen during routine cluster upgrades.
Consequence: The marketplace pod reported a degraded during normal upgrades, this information was ultimately surfaced in Telemetry and caused confusion for both cluster admins and customer experience teams.
Fix: The marketplace operator no longer reports that it is degraded when it exits gracefully.
Result: The marketplace operator is no longer flagged by Telemeter as degraded, reducing confusion for customers and customer experience teams.
The PR closing this bug [1] just teaches the marketplace operator to not set OperatorExited on graceful shutdowns. That's good, it's the cluster-version operator's job to complain if the marketplace operator fails to come back up. But it does not address why the marketplace operator was unable to come back up (i.e. the "stuck" portion of this bug). If anyone has a cluster where the marketplace operator is not coming back up or sticking an update, regardless of OperatorExited conditions, please file a new bug and point us at a must-gather, and we'll dig in.
[1]: https://github.com/operator-framework/operator-marketplace/pull/354
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2020:5633
Seen in the marketplace ClusterOperator early in a 4.3.18 -> 4.3.19 update: version: operator 4.3.18 conditions: 2020-05-05T23:31:20Z Progressing=False OperatorExited: The operator has exited 2020-05-19T06:39:39Z Available=False OperatorExited: The operator has exited 2020-05-05T23:31:00Z Upgradeable=True OperatorExited: Marketplace is upgradeable 2019-12-05T18:28:36Z Degraded=False OperatorExited: The operator has exited Checking out the operator pod: $ jq -r '.status | .startTime + " " + .phase + "\n" + ([.conditions[] | .lastTransitionTime + " " + (.lastProbeTime // "-") + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")] | join("\n"))' config/pod/openshift-marketplace/marketplace-operator-794975cff-h7m5f 2020-05-05T23:30:12Z Pending 2020-05-05T23:30:12Z - Initialized=True -: - 2020-05-19T06:39:42Z - Ready=False ContainersNotReady: containers with unready status: [marketplace-operator] 2020-05-19T06:39:42Z - ContainersReady=False ContainersNotReady: containers with unready status: [marketplace-operator] 2020-05-05T23:30:12Z - PodScheduled=True -: - so it has been Pending with an unready container (and no restarts) for two weeks.