Bug 1899258

Summary: marketplace operator stuck on install
Product: OpenShift Container Platform Reporter: Mangirdas Judeikis <mjudeiki>
Component: OLMAssignee: Over the Air Updates <aos-team-ota>
OLM sub component: OperatorHub QA Contact: Johnny Liu <jialiu>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, jokerman, krizza, wking
Version: 4.5   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-19 16:29:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mangirdas Judeikis 2020-11-18 18:38:41 UTC
Marketplace operator stuck during upgrade 

Description of problem:

During non-deterministic installs sometimes install process is not fully completed and CVO is not progressing to completion due to fact that marketplace operator hangs and is not updating ClusterOperator status so from CVO perspective 
it is "not rolled-out".

Where marketplace operator logs and performs as healthy component

How reproducible:

Steps to Reproduce:
Non-deterministic issue. Unknown

Actual results:
Marketplace operator looks healthy when it fact it is not. 

Expected results:
If the marketplace operator is in not healthy state, it should indicate so and be restarted by kuberentes layer.

Comment 5 W. Trevor King 2020-11-18 21:43:40 UTC
Cluster-version operator is successfully acting on the ClusterOperator that marketplace is feeding us.  Moving to the marketplace folks so they can look into fixing what they write to their ClusterOperator.  Also moving the reported version to 4.5.  From the must-gather in comment 1:

$ yaml2json <cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml | jq -r '.status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .state + " " + .version + " " + (.verified | tostring)'
2020-11-18T12:47:58Z - Partial 4.5.16 false
$ yaml2json <cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")' | sort
2020-11-18T12:47:58Z Available=False -: -
2020-11-18T12:47:58Z Progressing=True ClusterOperatorNotAvailable: Unable to apply 4.5.16: the cluster operator marketplace has not yet successfully rolled out
2020-11-18T12:47:59Z RetrievedUpdates=True -: -
2020-11-18T13:27:39Z Failing=True ClusterOperatorNotAvailable: Cluster operator marketplace is still updating
$ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators/marketplace.yaml | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")' | sort
2020-11-18T12:57:57Z Degraded=False OperandTransitionsSucceeding: Current CR sync ratio (1) meets the expected success ratio (0.3)
2020-11-18T12:57:57Z Upgradeable=True OperatorAvailable: Marketplace is upgradeable
$ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators/marketplace.yaml | jq -r '.status | keys | sort[]'
conditions
extension
relatedObjects

So marketplace has not yet claimed Available=True or set an operator version in status.versions.

Comment 6 Mangirdas Judeikis 2020-11-19 11:15:40 UTC
Thanks, Trevor! 

Yes, we noticed this today in our CI but sadly cluster for cleaned by a purger. But all indication in the logs says that this is the same issue.

Comment 7 Kevin Rizza 2020-11-19 16:29:20 UTC
This appears to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1881542, which should merge into 4.5 soon

*** This bug has been marked as a duplicate of bug 1881542 ***