Description of problem:
When the marketplace-operator starts up, it starts a cluster status monitoring routine. In some cases, the marketplace-operator will incorrectly mark the status as failing when a large enough number of operands fail to reconcile successfully (for example, OperatorSources that are pointing to an invalid app registry path).
In the case we saw this morning, our default OperatorSources are pointing to Quay.io appregistries and Quay.io services were all down. In that case, the marketplace-operator will always mark its OperatorStatus as OperatorFailing, and installs and upgrades will fail to complete.
Steps to Reproduce:
1. Add a bunch of operatorsources pointing to invalid urls/namespaces
2. Restart the marketplace operator
OperatorStatus for marketplace operator is set to OperatorFailing
OperatorStatus should be set to OperatorAvailable, since the operator itself is happily handling events.
Created attachment 1555602 [details]
Instances of "Cluster operator marketplace is still updating"
Example run :
level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-hkwt7dyg-2249a.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: Cluster operator marketplace is still updating: timed out waiting for the condition"
Seems to have been mostly resolved, but showed up again in the recent .
https://github.com/operator-framework/operator-marketplace/pull/165 has been merged and is ready for QA review.
Update marketplace-operator to report availability based on whether or not it
is able to reconcile CatalogSourceConfigs and OperatorSources. Report degraded
if OperatorSources or CatalogSourceConfigs fall below golden ratio, but keep
available condition set to true.
Add 6 operatorsources pointing to invalid urls/namespaces, restart the marketplace operator, the marketplace operator's status is OperatorAvailable after the marketplace is recover.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.