Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1843184

Summary: OLM - CSV's are stuck in the "Replacing" PHASE
Product: OpenShift Container Platform Reporter: Matt Woodson <mwoodson>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED WORKSFORME Docs Contact:
Severity: high    
Priority: medium CC: bluddy, dsover, hdogra, jiazha, krizza, nhale, nmalik, pbergene, travi
Version: 4.3.zKeywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-20 14:55:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Woodson 2020-06-02 18:43:51 UTC
Description of problem:
On many of the OSD clusters we have csv stuck in a "Replacing" phase.  Notice there are old versions stuck in the "Replacing" Phase.

=============================================================================================
$ oc get csv -A

NAMESPACE                                               NAME                                               DISPLAY                           VERSION           REPLACES                                           PHASE
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-apiserver                                     configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-authentication                                configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-build-test                                    configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-build-test                                    configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
=============================================================================================



Version-Release number of selected component (if applicable):
4.3.18, but have seen this on 4.3.19 as well.  I don't believe this is specific to these versions, just what version we currently have installed.

How reproducible:
I am unsure how to reproduce this issue.


Actual results:
I would expect these csv's to be in the "suceeded" state and the old ones to not be there.

Expected results:


Additional info:
I noticed that this isn't occurring on our staging clusters. Our staging clusters are normally short lived clusters (less than a week), but have noticed this on about 1/2 of our prod clsuters (which are long lived).



To clean this up, this is what I've found that works:

In this case, the configure-alertmanager-operator is deployed to the openshift-monitoring namespace.

$ oc project openshift-monitoring
$ oc get csv | grep -v NAME | awk '{print $1}'  | xargs oc delete csv
$ oc get installplan | grep -v NAME | awk '{print $1}'  | xargs oc delete installplan
$ oc delete subscription configure-alertmanager-operator

We then sync the subscription back, and it tends to clean up.  At this point the command "oc get csv -A" returns will the old version of configure-alert-manager removed with the latest being in the "Succeeded" Phase

Comment 5 Himanshu Dogra 2020-06-16 07:13:56 UTC
CC:hdogra

Comment 6 Jian Zhang 2020-06-19 05:50:00 UTC
Hi Daniel,

> Can QE verify that this issue exists in master (4.6)? I think a way to reproduce this to simply install the configure-alertmanager-operator via a subscription and monitor its upgrade cycle across namespaces.

Sorry for the late reply. Yes, sure. But, I couldn't find this "configure-alertmanager-operator" in default OperatorSource of the OCP 4.6 cluster.

mac:~ jianzhang$ oc get packagemanifest |grep -i alertmanager
mac:~ jianzhang$ 

@Matt, Could you help provide the detailed steps to install this operator? Which OperatorSource it's come from? Thanks!

Comment 10 Jian Zhang 2020-06-24 06:58:24 UTC
Hi Matt,

> It's being built and stored in app-sre quay repo.

I create an OperatorSource to consume this quay repo but failed, as follows:

mac:~ jianzhang$ oc create -f operatorsource-sre.yaml 
operatorsource.operators.coreos.com/sre-operators created
mac:~ jianzhang$ cat operatorsource-sre.yaml 
---
apiVersion: operators.coreos.com/v1
kind: OperatorSource
metadata:
  name: sre-operators
  namespace: openshift-marketplace
spec:
  endpoint: https://quay.io/cnr
  publisher: Red Hat
  registryNamespace: app-sre
  type: appregistry

mac:~ jianzhang$ oc get operatorsource
sre-operators         appregistry   https://quay.io/cnr   app-sre                                                 Red Hat     Failed      The OperatorSource endpoint returned an empty manifest list   84s

> Here is the CSV:
> ...

Currently, the operator cannot be installed if only provide the CSV object.
Because the SA created by the Subscription, not the CSV. As follows:

mac:~ jianzhang$ oc project openshift-monitoring 
Now using project "openshift-monitoring" on server "https://api.qe-jiazha23.qe.devcluster.openshift.com:6443".
mac:~ jianzhang$ 
mac:~ jianzhang$ oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES   PHASE
elasticsearch-operator.4.5.0-202006180838   Elasticsearch Operator   4.5.0-202006180838              Succeeded
mac:~ jianzhang$ oc create -f csv-configure-alertmanager-operator.yaml
clusterserviceversion.operators.coreos.com/configure-alertmanager-operator.v0.1.178-762dea8 created
mac:~ jianzhang$ oc get csv
NAME                                               DISPLAY                           VERSION              REPLACES                                           PHASE
configure-alertmanager-operator.v0.1.178-762dea8   configure-alertmanager-operator   0.1.178-762dea8      configure-alertmanager-operator.v0.1.176-900bd02   Pending

mac:~ jianzhang$ oc describe csv configure-alertmanager-operator.v0.1.178-762dea8 
...
  Requirement Status:
    Group:    
    Kind:     ServiceAccount
    Message:  Service account does not exist
    Name:     configure-alertmanager-operator
    Status:   NotPresent
    Version:  v1

Anyway, I guess that repo(app-sre) is private, could you help give me the read permission so that I can install it in my cluster? Thanks! My quay account is jiazha.

Comment 13 Jian Zhang 2020-06-29 11:48:49 UTC
1, Set the latest 4.6 cluster
[root@preserve-olm-env data]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-06-26-035408   True        False         24m     Cluster version is 4.6.0-0.nightly-2020-06-26-035408

2, Create the CatalogSour to provide the "configure-alertmanager-operator"
[root@preserve-olm-env data]# cat cs.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: alert-operator
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/app-sre/configure-alertmanager-operator-registry:production-762dea8
  displayName: Alert Operator
  publisher: grpc
[root@preserve-olm-env data]# oc create -f cs.yaml 
catalogsource.operators.coreos.com/alert-operator created

3, Install this operator
[root@preserve-olm-env data]# oc get sub -n default
NAME                              PACKAGE                           SOURCE           CHANNEL
configure-alertmanager-operator   configure-alertmanager-operator   alert-operator   production
[root@preserve-olm-env data]# oc get ip -n default
NAME            CSV                                                APPROVAL    APPROVED
install-nrcwd   configure-alertmanager-operator.v0.1.178-762dea8   Automatic   true
[root@preserve-olm-env data]# oc get csv -n default
NAME                                               DISPLAY                           VERSION                 REPLACES   PHASE
configure-alertmanager-operator.v0.1.178-762dea8   configure-alertmanager-operator   0.1.178-762dea8                    Succeeded
elasticsearch-operator.4.5.0-202006271533.p0       Elasticsearch Operator            4.5.0-202006271533.p0              Succeeded
[root@preserve-olm-env data]# oc get pods -n default
NAME                                              READY   STATUS    RESTARTS   AGE
configure-alertmanager-operator-679fbd459-gl497   1/1     Running   0          27s

Comment 19 Ben Luddy 2020-08-20 14:55:54 UTC
Hi Matt, since it is no longer reproducible, I'm going to close this issue. If you run into it again, please reopen so that we can investigate.