Description of problem: When the marketplace-operator starts up, it starts a cluster status monitoring routine. In some cases, the marketplace-operator will incorrectly mark the status as failing when a large enough number of operands fail to reconcile successfully (for example, OperatorSources that are pointing to an invalid app registry path). In the case we saw this morning, our default OperatorSources are pointing to Quay.io appregistries and Quay.io services were all down. In that case, the marketplace-operator will always mark its OperatorStatus as OperatorFailing, and installs and upgrades will fail to complete. How reproducible: Always Steps to Reproduce: 1. Add a bunch of operatorsources pointing to invalid urls/namespaces 2. Restart the marketplace operator Actual results: OperatorStatus for marketplace operator is set to OperatorFailing Expected results: OperatorStatus should be set to OperatorAvailable, since the operator itself is happily handling events.
Created attachment 1555602 [details] Instances of "Cluster operator marketplace is still updating" Example run [1]: level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-hkwt7dyg-2249a.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..." level=fatal msg="failed to initialize the cluster: Cluster operator marketplace is still updating: timed out waiting for the condition" Seems to have been mostly resolved, but showed up again in the recent [2]. [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_origin-aggregated-logging/1598/pull-ci-openshift-origin-aggregated-logging-master-e2e-aws/457 [2]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/302/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws/731
https://github.com/operator-framework/operator-marketplace/pull/165
https://github.com/operator-framework/operator-marketplace/pull/165 has been merged and is ready for QA review.
Solution: Update marketplace-operator to report availability based on whether or not it is able to reconcile CatalogSourceConfigs and OperatorSources. Report degraded if OperatorSources or CatalogSourceConfigs fall below golden ratio, but keep available condition set to true.
test env: cv:4.1.0-0.nightly-2019-05-04-070249 marketplace commit:4dbca9c55b80a7763f2afe392157c8843d5882ea test result: Add 6 operatorsources pointing to invalid urls/namespaces, restart the marketplace operator, the marketplace operator's status is OperatorAvailable after the marketplace is recover.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758