Description of problem: The Performance Addon Operator fails to install on the cluster after ACM (ver 2.2.3) pushes CRs for the CatalogSource, Namespace, OperatorGroup, and Subscription. All CRs are pushed (approximately) simultaneously. The subscription reports the installplan failed: Conditions: Last Transition Time: 2021-05-13T16:45:23Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Last Transition Time: 2021-05-12T23:25:20Z Reason: InstallCheckFailed Status: True Type: InstallPlanFailed Current CSV: performance-addon-operator.v4.8.0 Install Plan Generation: 1 Install Plan Ref: API Version: operators.coreos.com/v1alpha1 Kind: InstallPlan Name: install-dsmw9 Namespace: openshift-performance-addon-operator Resource Version: 23862 UID: 4ef2b462-8cf6-40ff-95d5-5cdf9ce9db28 Installplan: API Version: operators.coreos.com/v1alpha1 Kind: InstallPlan Name: install-dsmw9 Uuid: 4ef2b462-8cf6-40ff-95d5-5cdf9ce9db28 Last Updated: 2021-05-13T16:45:23Z State: UpgradePending The installplan reports: $ oc describe installplan install-dsmw9 -n openshift-performance-addon-operator Name: install-dsmw9 Namespace: openshift-performance-addon-operator Labels: operators.coreos.com/performance-addon-operator.openshift-performance-addon-operator= Annotations: <none> API Version: operators.coreos.com/v1alpha1 Kind: InstallPlan Metadata: Creation Timestamp: 2021-05-12T23:25:19Z Generate Name: install- Generation: 1 <snip> Status: Bundle Lookups: Catalog Source Ref: Name: performance-addon-operator Namespace: openshift-marketplace Conditions: Message: bundle contents have not yet been persisted to installplan status Reason: BundleNotUnpacked Status: True Type: BundleLookupNotPersisted Message: unpack job not yet started Reason: JobNotStarted Status: True Type: BundleLookupPending Identifier: performance-addon-operator.v4.8.0 Path: quay.io/openshift-kni/performance-addon-operator-bundle:4.8-snapshot Properties: {"properties":[{"type":"olm.gvk","value":{"group":"performance.openshift.io","kind":"PerformanceProfile","version":"v1"}},{"type":"olm.gvk","value":{"group":"performance.openshift.io","kind":"PerformanceProfile","version":"v1alpha1"}},{"type":"olm.gvk","value":{"group":"performance.openshift.io","kind":"PerformanceProfile","version":"v2"}},{"type":"olm.package","value":{"packageName":"performance-addon-operator","version":"4.8.0"}}]} Replaces: Catalog Sources: Conditions: Last Transition Time: 2021-05-12T23:25:19Z Last Update Time: 2021-05-12T23:25:19Z Message: invalid operator group - no operator group found that is managing this namespace Reason: InstallCheckFailed Status: False Type: Installed Phase: Failed The catalogsource shows as READY: Spec: Display Name: Openshift Performance Addon Operator Icon: base64data: Mediatype: Image: quay.io/openshift-kni/performance-addon-operator-index:4.8-snapshot Publisher: Red Hat Source Type: grpc Status: Connection State: Address: performance-addon-operator.openshift-marketplace.svc:50051 Last Connect: 2021-05-13T21:24:34Z Last Observed State: READY The catalog operator logs show transitioning health of catalogsource: $ oc logs -n openshift-operator-lifecycle-manager catalog-operator-76dbcc4787-r5w2p |grep performance time="2021-05-13T16:29:48Z" level=info msg=syncing id=+SERn ip=install-dsmw9 namespace=openshift-performance-addon-operator phase=Failed time="2021-05-13T16:29:57Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=performance-addon-operator state.State=CONNECTING" time="2021-05-13T16:30:04Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=performance-addon-operator state.State=TRANSIENT_FAILURE" time="2021-05-13T16:30:04Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.30.22.231:50051: connect: connection refused\"" catalog="{performance-addon-operator openshift-marketplace}" time="2021-05-13T16:30:05Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=performance-addon-operator state.State=CONNECTING" time="2021-05-13T16:30:06Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=performance-addon-operator state.State=TRANSIENT_FAILURE" time="2021-05-13T16:30:07Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=performance-addon-operator state.State=CONNECTING" time="2021-05-13T16:30:07Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=performance-addon-operator state.State=READY" Version-Release number of selected component (if applicable): SNO: 4.8.0-0.nightly-2021-05-12-122225 ACM: 2.2.3 How reproducible: Most of the time Steps to Reproduce: We have a pipeline which creates CRs, wraps them in ACM policy, deploys SNO and enrolls the cluster in ACM. Actual results: PAO is not installed Expected results: PAO installs Additional info:
We are still triaging this issue and will add more detail.
Ian, > pushes CRs for the CatalogSource, Namespace, OperatorGroup, and Subscription. All CRs are pushed (approximately) simultaneously That is most likely the issue here. If the catalog-operator sees an unreconciled OperatorGroup without a status, it flags it as an invalid OpertorGroup, and the InstallPlan fails. There are no retries by the InstallPlan to wait for the OperatorGroup to successfully reconcile at the moment, so the immediate solution would be to create the CatalogSource and OperatorGroup in step 1, wait for the successful reconciliation of the OperatorGroup, and then create the subscription for your operator in step 2. ps: the github issue link that you shared returns a 404 Adding NeedInfo to indicate we're waiting on feedback about the solution.
I saw the clear reproduce, when: 1. Create the PAO namespace 2. Create the PAO catalog source 3. Create the PAO subscription and wait, the OLM will create the install plan with the error message: "invalid operator group - no operator group found that is managing this namespace" Once you create the operator group nothing happens, so you will need to re-create the subscription I am expecting that the OLM will reconcile again once the operator group created.
Hey Artyom, Is there a reason you're not creating the OperatorGroup before creating the Subscription? All of our documents mentions the OperatorGroup as a pre-req. See this for example: https://olm.operatorframework.io/docs/tasks/install-operator-with-olm/#prerequisites Installplans were designed to be a single execution resource that initiate an installation and live as a book of record of that transaction. So what you're seeing is the expected behavior, i.e the error message on the InstallPlan is a record of an attempt at installing the Operator without any OperatorGroup present. When you create the OperatorGroup and re-create the subscription, the new InstallPlan is then the record of your second attempt of installing the Operator, this time with a valid OperatorGroup
Under our deployments, it the often situation that all resources created via "oc create -f <all_files_under_dir>" and it possible to have a race condition here(the install plan was created before the API server updated with the operator group). I can understand that the behavior is documented, but IMHO it not very user-friendly and can be a source for errors during deployments via CLI.
*** Bug 1972925 has been marked as a duplicate of this bug. ***
Kevin, What are the next steps here? This will definitely impact our customers using ACM. /KenY
Ken, We've discussed a possible solution where the creation of install plan is blocked till a valid OperatorGroup is detected. We should have a PR up with implementing that solution soon.
https://github.com/operator-framework/operator-lifecycle-manager/pull/2215 - Upstream pull request
verify: 1, install cluster [root@preserve-olm-agent-test 1960455]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-07-15-015134 True False 107m Cluster version is 4.9.0-0.nightly-2021-07-15-015134 [root@preserve-olm-agent-test 1960455]# oc adm release info registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-07-15-015134 --commits|grep operator-lifecycle-manager operator-lifecycle-manager https://github.com/openshift/operator-framework-olm 8740cee32bc0973361238df1ae8af3f87f7d6588 2, install catsrc, sub [root@preserve-olm-agent-test 1960455]# cat catsrc.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ditto-operator-index namespace: openshift-marketplace spec: displayName: Test publisher: OLM-QE sourceType: grpc image: quay.io/olmqe/ditto-index:v1-4.8-xzha updateStrategy: registryPoll: interval: 10m [root@preserve-olm-agent-test 1960455]# cat sub.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ditto-operator namespace: test-1 spec: channel: "alpha" installPlanApproval: Automatic name: ditto-operator source: ditto-operator-index sourceNamespace: openshift-marketplace [root@preserve-olm-agent-test 1960455]# oc apply -f catsrc.yaml catalogsource.operators.coreos.com/ditto-operator-index created [root@preserve-olm-agent-test 1960455]# oc new-project test-1 [root@preserve-olm-agent-test 1960455]# oc apply -f sub.yaml subscription.operators.coreos.com/ditto-operator created [root@preserve-olm-agent-test 1960455]# oc get ip -o yaml apiVersion: v1 items: - apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan metadata: creationTimestamp: "2021-07-16T03:29:55Z" generateName: install- generation: 1 labels: operators.coreos.com/ditto-operator.test-1: "" name: install-hk2qb namespace: test-1 ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: Subscription name: ditto-operator uid: 12bfa98e-eea5-4835-9479-e70f28cad301 resourceVersion: "76969" uid: affdd1bc-1759-4cb9-9e37-9c197e577418 spec: approval: Automatic approved: true clusterServiceVersionNames: - ditto-operator.v0.1.1 generation: 1 status: bundleLookups: - catalogSourceRef: name: ditto-operator-index namespace: openshift-marketplace conditions: - message: bundle contents have not yet been persisted to installplan status reason: BundleNotUnpacked status: "True" type: BundleLookupNotPersisted - message: unpack job not yet started reason: JobNotStarted status: "True" type: BundleLookupPending identifier: ditto-operator.v0.1.1 path: quay.io/olmqe/ditto-operator:0.1.1 properties: '{"properties":[{"type":"olm.gvk","value":{"group":"iot.eclipse.org","kind":"Ditto","version":"v1alpha1"}},{"type":"olm.package","value":{"packageName":"ditto-operator","version":"0.1.1"}}]}' replaces: ditto-operator.v0.1.0 catalogSources: [] conditions: - lastTransitionTime: "2021-07-16T03:29:55Z" lastUpdateTime: "2021-07-16T03:30:14Z" message: no operator group found that is managing this namespace reason: InstallCheckFailed status: "False" type: Installed phase: Installing kind: List metadata: resourceVersion: "" selfLink: "" 3) install og [root@preserve-olm-agent-test 1960455]# oc apply -f og.yaml operatorgroup.operators.coreos.com/og-single created check ip/csv [root@preserve-olm-agent-test 1960455]# oc get ip -o yaml apiVersion: v1 items: - apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan .... conditions: - lastTransitionTime: "2021-07-16T03:31:09Z" lastUpdateTime: "2021-07-16T03:31:09Z" status: "True" type: Installed phase: Complete ..... [root@preserve-olm-agent-test 1960455]# oc get csv NAME DISPLAY VERSION REPLACES PHASE ditto-operator.v0.1.1 Eclipse Ditto 0.1.1 ditto-operator.v0.1.0 Succeeded 4) check event [root@preserve-olm-agent-test 1960455]# oc get events --sort-by='.lastTimestamp' LAST SEEN TYPE REASON OBJECT MESSAGE 7m29s Normal Scheduled pod/ditto-operator-75df74ff55-mqbpr Successfully assigned test-1/ditto-operator-75df74ff55-mqbpr to ip-10-0-179-31.us-east-2.compute.internal 9m19s Normal CreatedSCCRanges namespace/test-1 created SCC ranges 7m30s Normal RequirementsUnknown clusterserviceversion/ditto-operator.v0.1.1 requirements not yet checked 7m30s Normal InstallWaiting clusterserviceversion/ditto-operator.v0.1.1 installing: waiting for deployment ditto-operator to become ready: deployment "ditto-operator" not available: Deployment does not have minimum availability. 7m30s Normal InstallWaiting clusterserviceversion/ditto-operator.v0.1.1 installing: waiting for deployment ditto-operator to become ready: waiting for spec update of deployment "ditto-operator" to be observed... 7m30s Normal InstallSucceeded clusterserviceversion/ditto-operator.v0.1.1 waiting for install components to report healthy 7m30s Normal SuccessfulCreate replicaset/ditto-operator-75df74ff55 Created pod: ditto-operator-75df74ff55-mqbpr 7m30s Normal AllRequirementsMet clusterserviceversion/ditto-operator.v0.1.1 all requirements found, attempting install 7m30s Normal ScalingReplicaSet deployment/ditto-operator Scaled up replica set ditto-operator-75df74ff55 to 1 7m27s Normal AddedInterface pod/ditto-operator-75df74ff55-mqbpr Add eth0 [10.129.2.20/23] from openshift-sdn 7m27s Normal Pulling pod/ditto-operator-75df74ff55-mqbpr Pulling image "docker.io/ctron/ditto-operator:0.1.1" 7m21s Normal Started pod/ditto-operator-75df74ff55-mqbpr Started container ditto-operator 7m21s Normal Created pod/ditto-operator-75df74ff55-mqbpr Created container ditto-operator 7m21s Normal Pulled pod/ditto-operator-75df74ff55-mqbpr Successfully pulled image "docker.io/ctron/ditto-operator:0.1.1" in 6.441539021s 7m20s Normal InstallSucceeded clusterserviceversion/ditto-operator.v0.1.1 install strategy completed with no errors LGTM, verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759