Bug 1700416

Summary: [marketplace] Cluster Operator Status incorrectly reporting: "Cluster operator marketplace is still updating"
Product: OpenShift Container Platform Reporter: Kevin Rizza <krizza>
Component: OLMAssignee: Alexander Greene <agreene>
Status: CLOSED ERRATA QA Contact: Fan Jia <jfan>
Severity: low Docs Contact:
Priority: low    
Version: 4.1.0CC: aravindh, wking
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When the marketplace-operator starts up, it starts a cluster status monitoring routine. In some cases, the marketplace-operator will incorrectly mark the status as failing when a large enough number of operands fail to reconcile successfully (for example, OperatorSources that are pointing to an invalid app registry path). Consequence: OpenShift install and upgrade tests are failing. Fix: Update marketplace-operator to report availability based on whether or not it is able to reconcile CatalogSourceConfigs and OperatorSources. Report degraded if OperatorSources or CatalogSourceConfigs fall below golden ratio, but keep available condition set to true. Result: OpenShift install and upgrade tests are passing again.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Instances of "Cluster operator marketplace is still updating" none

Description Kevin Rizza 2019-04-16 13:52:44 UTC
Description of problem:
When the marketplace-operator starts up, it starts a cluster status monitoring routine. In some cases, the marketplace-operator will incorrectly mark the status as failing when a large enough number of operands fail to reconcile successfully (for example, OperatorSources that are pointing to an invalid app registry path).

In the case we saw this morning, our default OperatorSources are pointing to Quay.io appregistries and Quay.io services were all down. In that case, the marketplace-operator will always mark its OperatorStatus as OperatorFailing, and installs and upgrades will fail to complete.



How reproducible:
Always


Steps to Reproduce:
1. Add a bunch of operatorsources pointing to invalid urls/namespaces
2. Restart the marketplace operator

Actual results:
OperatorStatus for marketplace operator is set to OperatorFailing


Expected results:
OperatorStatus should be set to OperatorAvailable, since the operator itself is happily handling events.

Comment 1 W. Trevor King 2019-04-16 16:17:25 UTC
Created attachment 1555602 [details]
Instances of "Cluster operator marketplace is still updating"

Example run [1]:

level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-hkwt7dyg-2249a.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: Cluster operator marketplace is still updating: timed out waiting for the condition"

Seems to have been mostly resolved, but showed up again in the recent [2].

[1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_origin-aggregated-logging/1598/pull-ci-openshift-origin-aggregated-logging-master-e2e-aws/457
[2]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/302/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws/731

Comment 2 Aravindh Puthiyaparambil 2019-04-18 20:19:48 UTC
https://github.com/operator-framework/operator-marketplace/pull/165

Comment 3 Alexander Greene 2019-05-01 14:22:58 UTC
https://github.com/operator-framework/operator-marketplace/pull/165 has been merged and is ready for QA review.

Comment 4 Alexander Greene 2019-05-01 14:23:52 UTC
Solution:
Update marketplace-operator to report availability based on whether or not it
is able to reconcile CatalogSourceConfigs and OperatorSources. Report degraded
if OperatorSources or CatalogSourceConfigs fall below golden ratio, but keep
available condition set to true.

Comment 5 Fan Jia 2019-05-05 02:26:55 UTC
test env:
cv:4.1.0-0.nightly-2019-05-04-070249
marketplace commit:4dbca9c55b80a7763f2afe392157c8843d5882ea

test result:
Add 6 operatorsources pointing to invalid urls/namespaces, restart the marketplace operator, the marketplace operator's status is OperatorAvailable after the marketplace is recover.

Comment 7 errata-xmlrpc 2019-06-04 10:47:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758