Bug 1700416 - [marketplace] Cluster Operator Status incorrectly reporting: "Cluster operator marketplace is still updating"
Summary: [marketplace] Cluster Operator Status incorrectly reporting: "Cluster operato...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.1.0
Assignee: Alexander Greene
QA Contact: Fan Jia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-16 13:52 UTC by Kevin Rizza
Modified: 2019-06-04 10:47 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When the marketplace-operator starts up, it starts a cluster status monitoring routine. In some cases, the marketplace-operator will incorrectly mark the status as failing when a large enough number of operands fail to reconcile successfully (for example, OperatorSources that are pointing to an invalid app registry path). Consequence: OpenShift install and upgrade tests are failing. Fix: Update marketplace-operator to report availability based on whether or not it is able to reconcile CatalogSourceConfigs and OperatorSources. Report degraded if OperatorSources or CatalogSourceConfigs fall below golden ratio, but keep available condition set to true. Result: OpenShift install and upgrade tests are passing again.
Clone Of:
Environment:
Last Closed: 2019-06-04 10:47:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Instances of "Cluster operator marketplace is still updating" (354.80 KB, image/png)
2019-04-16 16:17 UTC, W. Trevor King
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:47:43 UTC

Description Kevin Rizza 2019-04-16 13:52:44 UTC
Description of problem:
When the marketplace-operator starts up, it starts a cluster status monitoring routine. In some cases, the marketplace-operator will incorrectly mark the status as failing when a large enough number of operands fail to reconcile successfully (for example, OperatorSources that are pointing to an invalid app registry path).

In the case we saw this morning, our default OperatorSources are pointing to Quay.io appregistries and Quay.io services were all down. In that case, the marketplace-operator will always mark its OperatorStatus as OperatorFailing, and installs and upgrades will fail to complete.



How reproducible:
Always


Steps to Reproduce:
1. Add a bunch of operatorsources pointing to invalid urls/namespaces
2. Restart the marketplace operator

Actual results:
OperatorStatus for marketplace operator is set to OperatorFailing


Expected results:
OperatorStatus should be set to OperatorAvailable, since the operator itself is happily handling events.

Comment 1 W. Trevor King 2019-04-16 16:17:25 UTC
Created attachment 1555602 [details]
Instances of "Cluster operator marketplace is still updating"

Example run [1]:

level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-hkwt7dyg-2249a.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: Cluster operator marketplace is still updating: timed out waiting for the condition"

Seems to have been mostly resolved, but showed up again in the recent [2].

[1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_origin-aggregated-logging/1598/pull-ci-openshift-origin-aggregated-logging-master-e2e-aws/457
[2]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/302/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws/731

Comment 2 Aravindh Puthiyaparambil 2019-04-18 20:19:48 UTC
https://github.com/operator-framework/operator-marketplace/pull/165

Comment 3 Alexander Greene 2019-05-01 14:22:58 UTC
https://github.com/operator-framework/operator-marketplace/pull/165 has been merged and is ready for QA review.

Comment 4 Alexander Greene 2019-05-01 14:23:52 UTC
Solution:
Update marketplace-operator to report availability based on whether or not it
is able to reconcile CatalogSourceConfigs and OperatorSources. Report degraded
if OperatorSources or CatalogSourceConfigs fall below golden ratio, but keep
available condition set to true.

Comment 5 Fan Jia 2019-05-05 02:26:55 UTC
test env:
cv:4.1.0-0.nightly-2019-05-04-070249
marketplace commit:4dbca9c55b80a7763f2afe392157c8843d5882ea

test result:
Add 6 operatorsources pointing to invalid urls/namespaces, restart the marketplace operator, the marketplace operator's status is OperatorAvailable after the marketplace is recover.

Comment 7 errata-xmlrpc 2019-06-04 10:47:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.