Bug 1946790 - Marketplace operator flakes Available=False OperatorStarting during updates
Summary: Marketplace operator flakes Available=False OperatorStarting during updates
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Anik
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-06 20:47 UTC by W. Trevor King
Modified: 2021-07-27 22:58 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1885376
Environment:
[bz-OLM] clusteroperator/marketplace should not change condition/Available
Last Closed: 2021-07-27 22:57:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-marketplace pull 395 0 None open Bug 1946790: Update clusteroperator status conditions on startup 2021-04-22 20:34:11 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:58:17 UTC

Description W. Trevor King 2021-04-06 20:47:20 UTC
Not bug 1885376's install-time issue anymore, but we're seeing this in a lot of 4.8 update CI:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=OperatorStarting' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 17 runs, 59% failed, 160% of failures match = 94% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 17 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-openstack-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 4 runs, 100% failed, 75% of failures match = 75% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 15 runs, 93% failed, 107% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
pull-ci-openshift-cluster-api-provider-baremetal-master-e2e-metal-ipi-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
...
pull-ci-operator-framework-operator-marketplace-master-e2e-aws-upgrade (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-upgrade-single-node (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
rehearse-17190-periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
rehearse-17438-pull-ci-operator-framework-operator-marketplace-release-4.9-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
release-openshift-ocp-installer-upgrade-remote-libvirt-ppc64le-4.7-to-4.8 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
release-openshift-ocp-installer-upgrade-remote-libvirt-s390x-4.7-to-4.8 (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
release-openshift-okd-installer-e2e-aws-upgrade (all) - 7 runs, 71% failed, 100% of failures match = 71% impact

Picking [1] to poke at:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade/1379265480375668736/build-log.txt | grep 'Step e2e-aws-upgrade\|clusteroperator/marketplace'
INFO[2021-04-06T02:53:19Z] Step e2e-aws-upgrade-ipi-install-hosted-loki succeeded after 10.128319423s. 
...
INFO[2021-04-06T03:29:49Z] Step e2e-aws-upgrade-ipi-install-install-stableinitial succeeded after 35m50.049552241s. 
INFO[2021-04-06T04:38:40Z] Apr 06 04:03:06.708 E clusteroperator/marketplace condition/Available status/False reason/OperatorStarting changed: Determining status 
INFO[2021-04-06T04:38:40Z] Apr 06 04:03:06.708 W clusteroperator/marketplace condition/Progressing status/True reason/OperatorStarting changed: Progressing towards release version: 4.8.0-0.ci-2021-04-05-224633 
INFO[2021-04-06T04:38:40Z] Apr 06 04:03:06.708 W clusteroperator/marketplace condition/Upgradeable status/False reason/OperatorStarting changed: Determining status 
INFO[2021-04-06T04:38:40Z] Apr 06 04:03:06.708 - 20s   E clusteroperator/marketplace condition/Available status/False reason/Determining status 
INFO[2021-04-06T04:38:40Z] Apr 06 04:03:06.708 - 20s   W clusteroperator/marketplace condition/Progressing status/True reason/Progressing towards release version: 4.8.0-0.ci-2021-04-05-224633 
INFO[2021-04-06T04:38:40Z] Apr 06 04:03:26.720 W clusteroperator/marketplace condition/Available status/True reason/OperatorAvailable changed: Available release version: 4.8.0-0.ci-2021-04-05-224633 
...

That's over an hour after install.  My guess is that you have a race between [2] and whatever populates clusterOperator (an API call the flaked out?  An informer that hasn't populated its local cache yet?  Something else?).

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade/1379265480375668736
[2]: https://github.com/operator-framework/operator-marketplace/blob/906be3d7cf74d4d6e056c29005291fa8f3a16ac4/pkg/status/status.go#L206

Comment 2 Jian Zhang 2021-04-26 08:08:15 UTC
I checked the latest upgrade logs and didn't find marketplace clusteroperator flakes errors.

mac:~ jianzhang$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade/build-log.txt | grep 'Step e2e-aws-upgrade\|clusteroperator/marketplace'
mac:~ jianzhang$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade/1386515686704025600/build-log.txt | grep 'Step e2e-aws-upgrade\|clusteroperator/marketplace'
INFO[2021-04-26T03:04:13Z] Step e2e-aws-upgrade-ipi-install-hosted-loki succeeded after 10s. 
INFO[2021-04-26T03:04:24Z] Step e2e-aws-upgrade-ipi-conf succeeded after 10s. 
INFO[2021-04-26T03:04:34Z] Step e2e-aws-upgrade-ipi-conf-aws succeeded after 10s. 
INFO[2021-04-26T03:04:44Z] Step e2e-aws-upgrade-ipi-install-monitoringpvc succeeded after 10s. 
INFO[2021-04-26T03:04:54Z] Step e2e-aws-upgrade-ipi-install-rbac succeeded after 10s. 
INFO[2021-04-26T03:41:14Z] Step e2e-aws-upgrade-ipi-install-install-stableinitial succeeded after 36m20s. 
INFO[2021-04-26T04:50:44Z] Apr 26 04:22:33.286 I clusteroperator/marketplace versions: operator 4.8.0-0.ci-2021-04-25-044134 -> 4.8.0-0.ci-2021-04-25-055226 
INFO[2021-04-26T04:50:45Z] Step e2e-aws-upgrade-openshift-e2e-test failed after 1h9m30s. 
INFO[2021-04-26T04:51:45Z] Step e2e-aws-upgrade-openshift-e2e-test-capabilities-check succeeded after 1m0s. 
INFO[2021-04-26T04:52:05Z] Step e2e-aws-upgrade-gather-aws-console succeeded after 20s. 
INFO[2021-04-26T04:54:45Z] Step e2e-aws-upgrade-gather-must-gather succeeded after 2m40s. 
INFO[2021-04-26T04:56:35Z] Step e2e-aws-upgrade-gather-extra succeeded after 1m50s. 
INFO[2021-04-26T04:57:15Z] Step e2e-aws-upgrade-gather-audit-logs succeeded after 40s. 
INFO[2021-04-26T05:01:35Z] Step e2e-aws-upgrade-ipi-deprovision-deprovision succeeded after 4m20s.

LGTM, verify it.

Comment 5 errata-xmlrpc 2021-07-27 22:57:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.