Bug 1838352

Summary: OperatorExited, Pending marketplace-operator-... pod for several weeks
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: OLMAssignee: Alexander Greene <agreene>
OLM sub component: OLM QA Contact: Tom Buskey <tbuskey>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agreene, bluddy, ecordell, jiazha, krizza, nhale, tbuskey, vdinh
Version: 4.3.zFlags: agreene: needinfo-
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The Marketplace operator was written to report that the services it offered were degraded whenever the pod exited gracefully. This would happen during routine cluster upgrades. Consequence: The marketplace pod reported a degraded during normal upgrades, this information was ultimately surfaced in Telemetry and caused confusion for both cluster admins and customer experience teams. Fix: The marketplace operator no longer reports that it is degraded when it exits gracefully. Result: The marketplace operator is no longer flagged by Telemeter as degraded, reducing confusion for customers and customer experience teams.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:12:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1892382    

Description W. Trevor King 2020-05-21 01:17:54 UTC
Seen in the marketplace ClusterOperator early in a 4.3.18 -> 4.3.19 update:

version: operator 4.3.18
  2020-05-05T23:31:20Z Progressing=False OperatorExited: The operator has exited
  2020-05-19T06:39:39Z Available=False OperatorExited: The operator has exited
  2020-05-05T23:31:00Z Upgradeable=True OperatorExited: Marketplace is upgradeable
  2019-12-05T18:28:36Z Degraded=False OperatorExited: The operator has exited

Checking out the operator pod:

$ jq -r '.status | .startTime + " " + .phase + "\n" + ([.conditions[] | .lastTransitionTime + " " + (.lastProbeTime // "-") + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")] | join("\n"))' config/pod/openshift-marketplace/marketplace-operator-794975cff-h7m5f
2020-05-05T23:30:12Z Pending
2020-05-05T23:30:12Z - Initialized=True -: -
2020-05-19T06:39:42Z - Ready=False ContainersNotReady: containers with unready status: [marketplace-operator]
2020-05-19T06:39:42Z - ContainersReady=False ContainersNotReady: containers with unready status: [marketplace-operator]
2020-05-05T23:30:12Z - PodScheduled=True -: -

so it has been Pending with an unready container (and no restarts) for two weeks.

Comment 11 Evan Cordell 2020-10-26 13:49:12 UTC
*** Bug 1888383 has been marked as a duplicate of this bug. ***

Comment 19 W. Trevor King 2020-12-11 20:19:57 UTC
The PR closing this bug [1] just teaches the marketplace operator to not set OperatorExited on graceful shutdowns.  That's good, it's the cluster-version operator's job to complain if the marketplace operator fails to come back up.  But it does not address why the marketplace operator was unable to come back up (i.e. the "stuck" portion of this bug).  If anyone has a cluster where the marketplace operator is not coming back up or sticking an update, regardless of OperatorExited conditions, please file a new bug and point us at a must-gather, and we'll dig in.

[1]: https://github.com/operator-framework/operator-marketplace/pull/354

Comment 27 errata-xmlrpc 2021-02-24 15:12:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.