Bug 2074612
| Summary: | OLM failed to recreate SA for the CatalogSource that without poll Interval | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | jun |
| Component: | OLM | Assignee: | Alexander Greene <agreene> |
| OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | agreene, akrzos, imiller, jiazha, keyoung |
| Version: | 4.9 | Keywords: | Reopened, Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The CheckRegistryServer function used by grpc catalogSources did not confirm that the serviceAccount associated with the catalogSource exists.
Consequence: An unhealthy catalogSource with no serviceAccount could exist.
Fix: Update the GRPC CheckRegistryServer function to check if the serviceAccount exists, which will recreate the service if not found.
Result: OLM will recreate serviceAccounts owned by GRPC CatalogSources if they do not exist.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:06:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2080609 | ||
1, Create a cluster that contains the fixed PR.
mac:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-05-061544|grep olm
W0505 15:42:57.237937 17035 helpers.go:151] Defaulting of registry auth file to "${HOME}/.docker/config.json" is deprecated. The default will be switched to podman config locations in the future version.
operator-lifecycle-manager https://github.com/openshift/operator-framework-olm 5d74cef25c663ff581abdb87fa1a94fe7a222144
operator-registry https://github.com/openshift/operator-framework-olm 5d74cef25c663ff581abdb87fa1a94fe7a222144
mac:~ jianzhang$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-05-05-061544 True False 2m46s Cluster version is 4.11.0-0.nightly-2022-05-05-061544
2, Create a CatalogSource without the `updateStrategy`
mac:~ jianzhang$ cat cs-bug.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: bug-operator
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/olmqe/learn-operator-index:v2
displayName: Bug Operators
publisher: OLM QE
mac:~ jianzhang$ oc create -f cs-bug.yaml
catalogsource.operators.coreos.com/bug-operator created
3, Delete its SA.
mac:~ jianzhang$ oc get sa
NAME SECRETS AGE
bug-operator 2 3m39s
builder 2 31m
certified-operators 2 36m
community-operators 2 36m
default 2 40m
deployer 2 31m
marketplace-operator 2 40m
redhat-marketplace 2 36m
redhat-operators 2 36m
mac:~ jianzhang$ oc delete sa bug-operator
serviceaccount "bug-operator" deleted
4, It can be created as expected, LGTM, verify it.
mac:~ jianzhang$ oc get sa
NAME SECRETS AGE
bug-operator 2 7s
builder 2 31m
certified-operators 2 37m
community-operators 2 37m
default 2 41m
deployer 2 31m
marketplace-operator 2 40m
redhat-marketplace 2 37m
redhat-operators 2 37m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: During 1000 SNO ZTP deployment test, about 1% of the clusters get stuck on the subscription policy due to this kind of error from operator subscriptions: - message: 'error using catalog community-operators (in namespace openshift-marketplace): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup community-operators.openshift-marketplace.svc on [fd02::a]:53: server misbehaving"' reason: ErrorPreventedResolution status: "True" type: ResolutionFailed Version-Release number of selected component (if applicable): How reproducible: 100% in scale test Steps to Reproduce: 1. SNO deployment with DU profile at scale (50 clusters or 100 clusters per hour) 2. 3. Actual results: Subscription policy non compliant Expected results: Subscription policy should become compliant Additional info: