Bug 1843184 - OLM - CSV's are stuck in the "Replacing" PHASE
Summary: OLM - CSV's are stuck in the "Replacing" PHASE
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-02 18:43 UTC by Matt Woodson
Modified: 2020-08-20 14:55 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-20 14:55:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Matt Woodson 2020-06-02 18:43:51 UTC
Description of problem:
On many of the OSD clusters we have csv stuck in a "Replacing" phase.  Notice there are old versions stuck in the "Replacing" Phase.

=============================================================================================
$ oc get csv -A

NAMESPACE                                               NAME                                               DISPLAY                           VERSION           REPLACES                                           PHASE
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-apiserver-operator                            configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-apiserver                                     configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-apiserver                                     configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-authentication-operator                       configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-authentication                                configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.159-48e5f67   configure-alertmanager-operator   0.1.159-48e5f67   configure-alertmanager-operator.v0.1.148-1d9f69d   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.161-55157b5   configure-alertmanager-operator   0.1.161-55157b5   configure-alertmanager-operator.v0.1.159-48e5f67   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.163-01634f3   configure-alertmanager-operator   0.1.163-01634f3   configure-alertmanager-operator.v0.1.161-55157b5   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.166-9975325   configure-alertmanager-operator   0.1.166-9975325   configure-alertmanager-operator.v0.1.163-01634f3   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.169-a21bcaa   configure-alertmanager-operator   0.1.169-a21bcaa   configure-alertmanager-operator.v0.1.166-9975325   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.171-dba3c73   configure-alertmanager-operator   0.1.171-dba3c73   configure-alertmanager-operator.v0.1.169-a21bcaa   Replacing
openshift-authentication                                configure-alertmanager-operator.v0.1.173-15d7032   configure-alertmanager-operator   0.1.173-15d7032   configure-alertmanager-operator.v0.1.171-dba3c73   Succeeded
openshift-build-test                                    configure-alertmanager-operator.v0.1.146-aba1526   configure-alertmanager-operator   0.1.146-aba1526   configure-alertmanager-operator.v0.1.144-55ee0b5   Replacing
openshift-build-test                                    configure-alertmanager-operator.v0.1.148-1d9f69d   configure-alertmanager-operator   0.1.148-1d9f69d   configure-alertmanager-operator.v0.1.146-aba1526   Replacing
=============================================================================================



Version-Release number of selected component (if applicable):
4.3.18, but have seen this on 4.3.19 as well.  I don't believe this is specific to these versions, just what version we currently have installed.

How reproducible:
I am unsure how to reproduce this issue.


Actual results:
I would expect these csv's to be in the "suceeded" state and the old ones to not be there.

Expected results:


Additional info:
I noticed that this isn't occurring on our staging clusters. Our staging clusters are normally short lived clusters (less than a week), but have noticed this on about 1/2 of our prod clsuters (which are long lived).



To clean this up, this is what I've found that works:

In this case, the configure-alertmanager-operator is deployed to the openshift-monitoring namespace.

$ oc project openshift-monitoring
$ oc get csv | grep -v NAME | awk '{print $1}'  | xargs oc delete csv
$ oc get installplan | grep -v NAME | awk '{print $1}'  | xargs oc delete installplan
$ oc delete subscription configure-alertmanager-operator

We then sync the subscription back, and it tends to clean up.  At this point the command "oc get csv -A" returns will the old version of configure-alert-manager removed with the latest being in the "Succeeded" Phase

Comment 5 Himanshu Dogra 2020-06-16 07:13:56 UTC
CC:hdogra

Comment 6 Jian Zhang 2020-06-19 05:50:00 UTC
Hi Daniel,

> Can QE verify that this issue exists in master (4.6)? I think a way to reproduce this to simply install the configure-alertmanager-operator via a subscription and monitor its upgrade cycle across namespaces.

Sorry for the late reply. Yes, sure. But, I couldn't find this "configure-alertmanager-operator" in default OperatorSource of the OCP 4.6 cluster.

mac:~ jianzhang$ oc get packagemanifest |grep -i alertmanager
mac:~ jianzhang$ 

@Matt, Could you help provide the detailed steps to install this operator? Which OperatorSource it's come from? Thanks!

Comment 10 Jian Zhang 2020-06-24 06:58:24 UTC
Hi Matt,

> It's being built and stored in app-sre quay repo.

I create an OperatorSource to consume this quay repo but failed, as follows:

mac:~ jianzhang$ oc create -f operatorsource-sre.yaml 
operatorsource.operators.coreos.com/sre-operators created
mac:~ jianzhang$ cat operatorsource-sre.yaml 
---
apiVersion: operators.coreos.com/v1
kind: OperatorSource
metadata:
  name: sre-operators
  namespace: openshift-marketplace
spec:
  endpoint: https://quay.io/cnr
  publisher: Red Hat
  registryNamespace: app-sre
  type: appregistry

mac:~ jianzhang$ oc get operatorsource
sre-operators         appregistry   https://quay.io/cnr   app-sre                                                 Red Hat     Failed      The OperatorSource endpoint returned an empty manifest list   84s

> Here is the CSV:
> ...

Currently, the operator cannot be installed if only provide the CSV object.
Because the SA created by the Subscription, not the CSV. As follows:

mac:~ jianzhang$ oc project openshift-monitoring 
Now using project "openshift-monitoring" on server "https://api.qe-jiazha23.qe.devcluster.openshift.com:6443".
mac:~ jianzhang$ 
mac:~ jianzhang$ oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES   PHASE
elasticsearch-operator.4.5.0-202006180838   Elasticsearch Operator   4.5.0-202006180838              Succeeded
mac:~ jianzhang$ oc create -f csv-configure-alertmanager-operator.yaml
clusterserviceversion.operators.coreos.com/configure-alertmanager-operator.v0.1.178-762dea8 created
mac:~ jianzhang$ oc get csv
NAME                                               DISPLAY                           VERSION              REPLACES                                           PHASE
configure-alertmanager-operator.v0.1.178-762dea8   configure-alertmanager-operator   0.1.178-762dea8      configure-alertmanager-operator.v0.1.176-900bd02   Pending

mac:~ jianzhang$ oc describe csv configure-alertmanager-operator.v0.1.178-762dea8 
...
  Requirement Status:
    Group:    
    Kind:     ServiceAccount
    Message:  Service account does not exist
    Name:     configure-alertmanager-operator
    Status:   NotPresent
    Version:  v1

Anyway, I guess that repo(app-sre) is private, could you help give me the read permission so that I can install it in my cluster? Thanks! My quay account is jiazha.

Comment 13 Jian Zhang 2020-06-29 11:48:49 UTC
1, Set the latest 4.6 cluster
[root@preserve-olm-env data]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-06-26-035408   True        False         24m     Cluster version is 4.6.0-0.nightly-2020-06-26-035408

2, Create the CatalogSour to provide the "configure-alertmanager-operator"
[root@preserve-olm-env data]# cat cs.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: alert-operator
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/app-sre/configure-alertmanager-operator-registry:production-762dea8
  displayName: Alert Operator
  publisher: grpc
[root@preserve-olm-env data]# oc create -f cs.yaml 
catalogsource.operators.coreos.com/alert-operator created

3, Install this operator
[root@preserve-olm-env data]# oc get sub -n default
NAME                              PACKAGE                           SOURCE           CHANNEL
configure-alertmanager-operator   configure-alertmanager-operator   alert-operator   production
[root@preserve-olm-env data]# oc get ip -n default
NAME            CSV                                                APPROVAL    APPROVED
install-nrcwd   configure-alertmanager-operator.v0.1.178-762dea8   Automatic   true
[root@preserve-olm-env data]# oc get csv -n default
NAME                                               DISPLAY                           VERSION                 REPLACES   PHASE
configure-alertmanager-operator.v0.1.178-762dea8   configure-alertmanager-operator   0.1.178-762dea8                    Succeeded
elasticsearch-operator.4.5.0-202006271533.p0       Elasticsearch Operator            4.5.0-202006271533.p0              Succeeded
[root@preserve-olm-env data]# oc get pods -n default
NAME                                              READY   STATUS    RESTARTS   AGE
configure-alertmanager-operator-679fbd459-gl497   1/1     Running   0          27s

Comment 19 Ben Luddy 2020-08-20 14:55:54 UTC
Hi Matt, since it is no longer reproducible, I'm going to close this issue. If you run into it again, please reopen so that we can investigate.


Note You need to log in before you can comment on or make changes to this bug.