Bug 1913132
Summary: | The installation of Openshift Virtualization reports success early before it 's succeeded eventually | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Guohua Ouyang <gouyang> | ||||
Component: | OLM | Assignee: | Alexander Greene <agreene> | ||||
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | agreene, aos-bugs, cnv-qe-bugs, gouyang, ocohen, stirabos, yzamir | ||||
Version: | 4.7 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.7.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: OLM recently introduced a new controller that updates deployments defined in a CSV with an Environment Variable that is used to identify the OperatorCondition owned by the operator.
Consequence: The deployment is immediately updated after being created by OLM creating a choppy installation.
Fix: OLM now creates the deployment with the OperatorCondition Environment variable.
Result: OLM no longer immediately updates the list of environment variables after creating a deployment.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-02-24 15:50:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
it looks like an installer issue, moving to Instalation From my initial investigation, looks like a regression in OLM. During the initial phase of the installation (after making the subscription, before creating the HCO CR), pods are being rolled-out, for each OLM-controlled deployment, two replica sets are being created. This is causing hco operator to report "Ready" for a brief moment, until another pod is being rotated. This behavior is observed on OCP 4.7.0-fc.0, but not on OCP 4.6.9, for the same index image containing CNV 2.6.0: registry-proxy.engineering.redhat.com/rh-osbs/iib:36168 <==> hco-bundle-registry-container-v2.6.0-454 # installation on OCP 4.7.0-fc.0: $ oc get rs NAME DESIRED CURRENT READY AGE cdi-operator-7d46b49c9f 1 1 1 23m cdi-operator-b7b778cc5 0 0 0 23m cluster-network-addons-operator-5ffccdf57 0 0 0 23m cluster-network-addons-operator-759b89f64c 1 1 1 23m hco-operator-658dc8f879 1 1 1 23m hco-operator-755cc7d989 0 0 0 23m hco-webhook-56d6fb844d 1 1 1 23m hco-webhook-6dd746cddb 0 0 0 23m hostpath-provisioner-operator-5b6f57d6d9 0 0 0 23m hostpath-provisioner-operator-f4649cfd9 1 1 1 23m kubevirt-ssp-operator-6dffbcbcfb 0 0 0 23m kubevirt-ssp-operator-8649744554 1 1 1 23m node-maintenance-operator-7d49bf99ff 0 0 0 23m node-maintenance-operator-d5c8786c 1 1 1 23m virt-operator-6dcf7ffb84 0 0 0 23m virt-operator-8696645c98 2 2 2 23m vm-import-operator-56bf9fccd4 0 0 0 23m vm-import-operator-65c86b748 1 1 1 23m # installation on OCP 4.6.9: $ oc get rs NAME DESIRED CURRENT READY AGE cdi-operator-7959bcd65b 1 1 1 13m cluster-network-addons-operator-5678b84f6b 1 1 1 13m hco-operator-99c776db8 1 1 1 13m hco-webhook-795df79cd5 1 1 1 13m hostpath-provisioner-operator-58b48bc6fd 1 1 1 13m kubevirt-ssp-operator-6578f4b6fc 1 1 1 13m node-maintenance-operator-69cf9bf685 1 1 1 13m virt-operator-64655949c7 2 2 2 13m virt-template-validator-844bf5ddc9 2 2 2 10m vm-import-operator-7bbf8fb485 1 1 1 13m For reference, I installed another Red Hat supported operator (ACM), and encountered the same issue: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-fc.0 True False 6d19h Cluster version is 4.7.0-fc.0 $ oc get rs -n open-cluster-management NAME DESIRED CURRENT READY AGE cluster-manager-7457b7f8f9 3 3 3 31m cluster-manager-8558df4566 0 0 0 30m hive-operator-647fb55f9f 1 1 1 31m hive-operator-85bcc96cff 0 0 0 30m multicluster-observability-operator-5967f776c8 1 1 1 31m multicluster-observability-operator-8465647ccd 0 0 0 30m multicluster-operators-application-75477bf55d 1 1 1 31m multicluster-operators-application-999757f6b 0 0 0 30m multicluster-operators-hub-subscription-98f794f9 0 0 0 30m multicluster-operators-hub-subscription-f6bd5bd99 1 1 1 31m multicluster-operators-standalone-subscription-7f697f8db8 1 1 1 31m multicluster-operators-standalone-subscription-db5ddc968 0 0 0 30m multiclusterhub-operator-5dcbcb7bbf 1 1 1 31m multiclusterhub-operator-698b5dc7fc 0 0 0 30m Which strengthen the suspicion it's an OLM bug. Moving to OLM, as advised by @agreene Cluster version is 4.7.0-0.nightly-2021-01-07-235021 [root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-69b986886c-r7hr7 -- olm --version OLM version: 0.17.0 git commit: ac075ae4d1081a49c15c8c2edfeb71d8d3e0363e 1, Subscribe to the etcdoperator in the "default" project. [root@preserve-olm-env data]# cat og.yaml apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: test-og namespace: default spec: targetNamespaces: - default [root@preserve-olm-env data]# cat sub-etcd-community.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: etcd namespace: default spec: channel: singlenamespace-alpha installPlanApproval: Automatic name: etcd source: community-operators sourceNamespace: openshift-marketplace startingCSV: etcdoperator.v0.9.4 [root@preserve-olm-env data]# oc create -f og.yaml operatorgroup.operators.coreos.com/test-og created [root@preserve-olm-env data]# oc create -f sub-etcd-community.yaml subscription.operators.coreos.com/etcd created 2, Checking the ReplicaSet. [root@preserve-olm-env data]# oc get csv -n default NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Succeeded [root@preserve-olm-env data]# oc get deployment -n default NAME READY UP-TO-DATE AVAILABLE AGE etcd-operator 1/1 1 1 33s [root@preserve-olm-env data]# oc get rs -n default NAME DESIRED CURRENT READY AGE etcd-operator-74cd66bbff 1 1 1 43s Only one rs generated, looks good to me, verify it. @gouyang , now that the bug on OLM has been resolved, could you please verify that the CNV installation issue when using OperatorHub UI is no longer observed? Thanks (In reply to Oren Cohen from comment #7) > @gouyang , now that the bug on OLM has been resolved, could you > please verify that the CNV installation issue when using OperatorHub UI is > no longer observed? > Thanks Verified the issue is not existing on latest environment. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Created attachment 1744768 [details] video shows install failure after it reports success Description of problem: Install Openshift Virtualization on OCP console, it reports success early and suddenly a failure is found, then it goes back to install again, it's successful after it. Version-Release number of selected component (if applicable): OCP 4.7 + CNV 2.6.0 How reproducible: 100% Steps to Reproduce: 1. On OCP console, go to Operators -> OperatorHub 2. Inputs 'Openshift Virtualization' in the filter. 3. Select 2.6.0 to install Actual results: The installation of Openshift Virtualization reports success early before it 's succeeded eventually Expected results: Once it's showing success on the page, no failures occur. Additional info: