Description of problem: On many of the OSD clusters we have csv stuck in a "Replacing" phase. Notice there are old versions stuck in the "Replacing" Phase. ============================================================================================= $ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-apiserver-operator configure-alertmanager-operator.v0.1.146-aba1526 configure-alertmanager-operator 0.1.146-aba1526 configure-alertmanager-operator.v0.1.144-55ee0b5 Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.148-1d9f69d configure-alertmanager-operator 0.1.148-1d9f69d configure-alertmanager-operator.v0.1.146-aba1526 Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.159-48e5f67 configure-alertmanager-operator 0.1.159-48e5f67 configure-alertmanager-operator.v0.1.148-1d9f69d Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.161-55157b5 configure-alertmanager-operator 0.1.161-55157b5 configure-alertmanager-operator.v0.1.159-48e5f67 Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.163-01634f3 configure-alertmanager-operator 0.1.163-01634f3 configure-alertmanager-operator.v0.1.161-55157b5 Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.166-9975325 configure-alertmanager-operator 0.1.166-9975325 configure-alertmanager-operator.v0.1.163-01634f3 Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.169-a21bcaa configure-alertmanager-operator 0.1.169-a21bcaa configure-alertmanager-operator.v0.1.166-9975325 Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.171-dba3c73 configure-alertmanager-operator 0.1.171-dba3c73 configure-alertmanager-operator.v0.1.169-a21bcaa Replacing openshift-apiserver-operator configure-alertmanager-operator.v0.1.173-15d7032 configure-alertmanager-operator 0.1.173-15d7032 configure-alertmanager-operator.v0.1.171-dba3c73 Succeeded openshift-apiserver configure-alertmanager-operator.v0.1.146-aba1526 configure-alertmanager-operator 0.1.146-aba1526 configure-alertmanager-operator.v0.1.144-55ee0b5 Replacing openshift-apiserver configure-alertmanager-operator.v0.1.148-1d9f69d configure-alertmanager-operator 0.1.148-1d9f69d configure-alertmanager-operator.v0.1.146-aba1526 Replacing openshift-apiserver configure-alertmanager-operator.v0.1.159-48e5f67 configure-alertmanager-operator 0.1.159-48e5f67 configure-alertmanager-operator.v0.1.148-1d9f69d Replacing openshift-apiserver configure-alertmanager-operator.v0.1.161-55157b5 configure-alertmanager-operator 0.1.161-55157b5 configure-alertmanager-operator.v0.1.159-48e5f67 Replacing openshift-apiserver configure-alertmanager-operator.v0.1.163-01634f3 configure-alertmanager-operator 0.1.163-01634f3 configure-alertmanager-operator.v0.1.161-55157b5 Replacing openshift-apiserver configure-alertmanager-operator.v0.1.166-9975325 configure-alertmanager-operator 0.1.166-9975325 configure-alertmanager-operator.v0.1.163-01634f3 Replacing openshift-apiserver configure-alertmanager-operator.v0.1.169-a21bcaa configure-alertmanager-operator 0.1.169-a21bcaa configure-alertmanager-operator.v0.1.166-9975325 Replacing openshift-apiserver configure-alertmanager-operator.v0.1.171-dba3c73 configure-alertmanager-operator 0.1.171-dba3c73 configure-alertmanager-operator.v0.1.169-a21bcaa Replacing openshift-apiserver configure-alertmanager-operator.v0.1.173-15d7032 configure-alertmanager-operator 0.1.173-15d7032 configure-alertmanager-operator.v0.1.171-dba3c73 Succeeded openshift-authentication-operator configure-alertmanager-operator.v0.1.146-aba1526 configure-alertmanager-operator 0.1.146-aba1526 configure-alertmanager-operator.v0.1.144-55ee0b5 Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.148-1d9f69d configure-alertmanager-operator 0.1.148-1d9f69d configure-alertmanager-operator.v0.1.146-aba1526 Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.159-48e5f67 configure-alertmanager-operator 0.1.159-48e5f67 configure-alertmanager-operator.v0.1.148-1d9f69d Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.161-55157b5 configure-alertmanager-operator 0.1.161-55157b5 configure-alertmanager-operator.v0.1.159-48e5f67 Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.163-01634f3 configure-alertmanager-operator 0.1.163-01634f3 configure-alertmanager-operator.v0.1.161-55157b5 Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.166-9975325 configure-alertmanager-operator 0.1.166-9975325 configure-alertmanager-operator.v0.1.163-01634f3 Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.169-a21bcaa configure-alertmanager-operator 0.1.169-a21bcaa configure-alertmanager-operator.v0.1.166-9975325 Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.171-dba3c73 configure-alertmanager-operator 0.1.171-dba3c73 configure-alertmanager-operator.v0.1.169-a21bcaa Replacing openshift-authentication-operator configure-alertmanager-operator.v0.1.173-15d7032 configure-alertmanager-operator 0.1.173-15d7032 configure-alertmanager-operator.v0.1.171-dba3c73 Succeeded openshift-authentication configure-alertmanager-operator.v0.1.146-aba1526 configure-alertmanager-operator 0.1.146-aba1526 configure-alertmanager-operator.v0.1.144-55ee0b5 Replacing openshift-authentication configure-alertmanager-operator.v0.1.148-1d9f69d configure-alertmanager-operator 0.1.148-1d9f69d configure-alertmanager-operator.v0.1.146-aba1526 Replacing openshift-authentication configure-alertmanager-operator.v0.1.159-48e5f67 configure-alertmanager-operator 0.1.159-48e5f67 configure-alertmanager-operator.v0.1.148-1d9f69d Replacing openshift-authentication configure-alertmanager-operator.v0.1.161-55157b5 configure-alertmanager-operator 0.1.161-55157b5 configure-alertmanager-operator.v0.1.159-48e5f67 Replacing openshift-authentication configure-alertmanager-operator.v0.1.163-01634f3 configure-alertmanager-operator 0.1.163-01634f3 configure-alertmanager-operator.v0.1.161-55157b5 Replacing openshift-authentication configure-alertmanager-operator.v0.1.166-9975325 configure-alertmanager-operator 0.1.166-9975325 configure-alertmanager-operator.v0.1.163-01634f3 Replacing openshift-authentication configure-alertmanager-operator.v0.1.169-a21bcaa configure-alertmanager-operator 0.1.169-a21bcaa configure-alertmanager-operator.v0.1.166-9975325 Replacing openshift-authentication configure-alertmanager-operator.v0.1.171-dba3c73 configure-alertmanager-operator 0.1.171-dba3c73 configure-alertmanager-operator.v0.1.169-a21bcaa Replacing openshift-authentication configure-alertmanager-operator.v0.1.173-15d7032 configure-alertmanager-operator 0.1.173-15d7032 configure-alertmanager-operator.v0.1.171-dba3c73 Succeeded openshift-build-test configure-alertmanager-operator.v0.1.146-aba1526 configure-alertmanager-operator 0.1.146-aba1526 configure-alertmanager-operator.v0.1.144-55ee0b5 Replacing openshift-build-test configure-alertmanager-operator.v0.1.148-1d9f69d configure-alertmanager-operator 0.1.148-1d9f69d configure-alertmanager-operator.v0.1.146-aba1526 Replacing ============================================================================================= Version-Release number of selected component (if applicable): 4.3.18, but have seen this on 4.3.19 as well. I don't believe this is specific to these versions, just what version we currently have installed. How reproducible: I am unsure how to reproduce this issue. Actual results: I would expect these csv's to be in the "suceeded" state and the old ones to not be there. Expected results: Additional info: I noticed that this isn't occurring on our staging clusters. Our staging clusters are normally short lived clusters (less than a week), but have noticed this on about 1/2 of our prod clsuters (which are long lived). To clean this up, this is what I've found that works: In this case, the configure-alertmanager-operator is deployed to the openshift-monitoring namespace. $ oc project openshift-monitoring $ oc get csv | grep -v NAME | awk '{print $1}' | xargs oc delete csv $ oc get installplan | grep -v NAME | awk '{print $1}' | xargs oc delete installplan $ oc delete subscription configure-alertmanager-operator We then sync the subscription back, and it tends to clean up. At this point the command "oc get csv -A" returns will the old version of configure-alert-manager removed with the latest being in the "Succeeded" Phase
CC:hdogra
Hi Daniel, > Can QE verify that this issue exists in master (4.6)? I think a way to reproduce this to simply install the configure-alertmanager-operator via a subscription and monitor its upgrade cycle across namespaces. Sorry for the late reply. Yes, sure. But, I couldn't find this "configure-alertmanager-operator" in default OperatorSource of the OCP 4.6 cluster. mac:~ jianzhang$ oc get packagemanifest |grep -i alertmanager mac:~ jianzhang$ @Matt, Could you help provide the detailed steps to install this operator? Which OperatorSource it's come from? Thanks!
Hi Matt, > It's being built and stored in app-sre quay repo. I create an OperatorSource to consume this quay repo but failed, as follows: mac:~ jianzhang$ oc create -f operatorsource-sre.yaml operatorsource.operators.coreos.com/sre-operators created mac:~ jianzhang$ cat operatorsource-sre.yaml --- apiVersion: operators.coreos.com/v1 kind: OperatorSource metadata: name: sre-operators namespace: openshift-marketplace spec: endpoint: https://quay.io/cnr publisher: Red Hat registryNamespace: app-sre type: appregistry mac:~ jianzhang$ oc get operatorsource sre-operators appregistry https://quay.io/cnr app-sre Red Hat Failed The OperatorSource endpoint returned an empty manifest list 84s > Here is the CSV: > ... Currently, the operator cannot be installed if only provide the CSV object. Because the SA created by the Subscription, not the CSV. As follows: mac:~ jianzhang$ oc project openshift-monitoring Now using project "openshift-monitoring" on server "https://api.qe-jiazha23.qe.devcluster.openshift.com:6443". mac:~ jianzhang$ mac:~ jianzhang$ oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.4.5.0-202006180838 Elasticsearch Operator 4.5.0-202006180838 Succeeded mac:~ jianzhang$ oc create -f csv-configure-alertmanager-operator.yaml clusterserviceversion.operators.coreos.com/configure-alertmanager-operator.v0.1.178-762dea8 created mac:~ jianzhang$ oc get csv NAME DISPLAY VERSION REPLACES PHASE configure-alertmanager-operator.v0.1.178-762dea8 configure-alertmanager-operator 0.1.178-762dea8 configure-alertmanager-operator.v0.1.176-900bd02 Pending mac:~ jianzhang$ oc describe csv configure-alertmanager-operator.v0.1.178-762dea8 ... Requirement Status: Group: Kind: ServiceAccount Message: Service account does not exist Name: configure-alertmanager-operator Status: NotPresent Version: v1 Anyway, I guess that repo(app-sre) is private, could you help give me the read permission so that I can install it in my cluster? Thanks! My quay account is jiazha.
1, Set the latest 4.6 cluster [root@preserve-olm-env data]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-06-26-035408 True False 24m Cluster version is 4.6.0-0.nightly-2020-06-26-035408 2, Create the CatalogSour to provide the "configure-alertmanager-operator" [root@preserve-olm-env data]# cat cs.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: alert-operator namespace: openshift-marketplace spec: sourceType: grpc image: quay.io/app-sre/configure-alertmanager-operator-registry:production-762dea8 displayName: Alert Operator publisher: grpc [root@preserve-olm-env data]# oc create -f cs.yaml catalogsource.operators.coreos.com/alert-operator created 3, Install this operator [root@preserve-olm-env data]# oc get sub -n default NAME PACKAGE SOURCE CHANNEL configure-alertmanager-operator configure-alertmanager-operator alert-operator production [root@preserve-olm-env data]# oc get ip -n default NAME CSV APPROVAL APPROVED install-nrcwd configure-alertmanager-operator.v0.1.178-762dea8 Automatic true [root@preserve-olm-env data]# oc get csv -n default NAME DISPLAY VERSION REPLACES PHASE configure-alertmanager-operator.v0.1.178-762dea8 configure-alertmanager-operator 0.1.178-762dea8 Succeeded elasticsearch-operator.4.5.0-202006271533.p0 Elasticsearch Operator 4.5.0-202006271533.p0 Succeeded [root@preserve-olm-env data]# oc get pods -n default NAME READY STATUS RESTARTS AGE configure-alertmanager-operator-679fbd459-gl497 1/1 Running 0 27s
Hi Matt, since it is no longer reproducible, I'm going to close this issue. If you run into it again, please reopen so that we can investigate.