Bug 1737081

Summary: CatalogSource Status should have information on last observed state(s)
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Salvatore Colangelo <scolange>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: medium CC: bandrade, chezhang, chuo, jfan, jiazha, scolange
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:34:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abu Kashem 2019-08-02 14:37:18 UTC
Description of problem:
Currently Status of CatalogSource does not show any information to the user on current state(s). For example, if the registry pod backed by the CatalogSource is crash looping, the status shows no information. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create a ConfigMap object that has a bad operator manifest. 
 - Use this package manifest https://raw.githubusercontent.com/jianzhangbjz/v3-testfiles/v4.0/olm/configmap/operator-minKubeVersion.yaml
 - Replace `currentCSV` under `packages` to an invalid CSV name like `does-not-exist`.


2. Create a Catalog Source object that uses the ConfigMap above.
 - Use https://raw.githubusercontent.com/jianzhangbjz/v3-testfiles/v4.0/olm/catalogsource/catalogsource.yaml.


Actual results:
 - Although the registry pod is in `CrashLoopBackOff` state, the status of the CatalogSource object does not show any information to the user. 


Expected results:
 - In the case of an unhealthy CatalogSource, its Status should display relevant information to the user. 


Additional info:

Comment 2 Evan Cordell 2019-08-27 14:43:09 UTC
*** Bug 1746044 has been marked as a duplicate of this bug. ***

Comment 3 Salvatore Colangelo 2019-08-28 15:38:37 UTC
[scolange@scolange BUG-1737081]$ oc get pods -n default
No resources found.
[scolange@scolange BUG-1737081]$ oc get  catalogsource -n default
NAME                                   DISPLAY               TYPE       PUBLISHER   AGE
installed-community-global-operators   Community Operators   internal   Community   13m
[scolange@scolange BUG-1737081]$ oc get configMap -n openshift-operators
NAME                                   DATA   AGE
installed-community-global-operators   3      14m
[scolange@scolange BUG-1737081]$ oc describe  catalogsource -n default
Name:         installed-community-global-operators
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  operators.coreos.com/v1alpha1
Kind:         CatalogSource
Metadata:
  Creation Timestamp:  2019-08-28T15:23:59Z
  Generation:          1
  Resource Version:    43595
  Self Link:           /apis/operators.coreos.com/v1alpha1/namespaces/default/catalogsources/installed-community-global-operators
  UID:                 dbbd855d-c9a7-11e9-a5e8-06d49a7a48d2
Spec:
  Config Map:    installed-community-global-operators
  Display Name:  Community Operators
  Icon:
    Base 64 Data:  
    Mediatype:     
  Publisher:       Community
  Source Type:     internal
Status:
  Message:  failed to get catalog config map installed-community-global-operators: configmap "installed-community-global-operators" not found
  Reason:   ConfigMapError
Events:     <none>
[scolange@scolange BUG-1737081]$ oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.ci-2019-08-28-103038   True        False         105m    Cluster version is 4.2.0-0.ci-2019-08-28-103038

Comment 4 Salvatore Colangelo 2019-08-28 16:16:01 UTC
No reason show in CatalogSource :

Step1.

Change the currentCSV

scolange@scolange BUG-1737081]$ grep currentCSV  operator-minKubeVersion.yaml 
      - currentCSV: etc.2

oc create -f operator-minKubeVersion.yaml

Step2.
[scolange@scolange BUG-1737081]$ cat catalogsource.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: installed-community-global-operators
  namespace: openshift-operators
spec:
  configMap: installed-community-global-operators
  displayName: Community Operators
  icon:
    base64data: ""
    mediatype: ""
  publisher: Community
  sourceType: internal

oc create -f catalogsource.yaml

Step3. 

[scolange@scolange BUG-1737081]$ oc get catalogsource -n openshift-operators
NAME                                   DISPLAY               TYPE       PUBLISHER   AGE
installed-community-global-operators   Community Operators   internal   Community   21m

[scolange@scolange BUG-1737081]$ oc get configmaps -n openshift-operators
NAME                                   DATA   AGE
installed-community-global-operators   3      48m


[scolange@scolange BUG-1737081]$ oc get pods -n openshift-operators
NAME                                         READY   STATUS             RESTARTS   AGE
installed-community-global-operators-jkkw8   0/1     CrashLoopBackOff   6          9m12s


[scolange@scolange BUG-1737081]$ oc get catalogsource -n openshift-operators -o yaml
apiVersion: v1
items:
- apiVersion: operators.coreos.com/v1alpha1
  kind: CatalogSource
  metadata:
    creationTimestamp: 2019-08-28T15:49:32Z
    generation: 1
    name: installed-community-global-operators
    namespace: openshift-operators
    resourceVersion: "57242"
    selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/catalogsources/installed-community-global-operators
    uid: 6dc6d5b7-c9ab-11e9-a5e8-06d49a7a48d2
  spec:
    configMap: installed-community-global-operators
    displayName: Community Operators
    icon:
      base64data: ""
      mediatype: ""
    publisher: Community
    sourceType: internal
  status:
    configMapReference:
      lastUpdateTime: 2019-08-28T15:49:33Z
      name: installed-community-global-operators
      namespace: openshift-operators
      resourceVersion: "50893"
      uid: add702de-c9a7-11e9-b0e7-0a73a39bbbc8
    connectionState:
      address: installed-community-global-operators.openshift-operators.svc.cluster.local:50051
      lastConnect: 2019-08-28T16:11:11Z
      lastObservedState: TRANSIENT_FAILURE
    registryService:
      createdAt: 2019-08-28T15:49:34Z
      port: "50051"
      protocol: grpc
      serviceName: installed-community-global-operators
      serviceNamespace: openshift-operators
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

The only status that the catalogsource show is : lastObservedState: TRANSIENT_FAILURE



Should be as mentioned in comment before:

Status:
  Message:  failed to get catalog config map installed-community-global-operators: configmap "installed-community-global-operators" not found
  Reason:   ConfigMapError
Events:     <none>

Comment 5 Evan Cordell 2019-08-28 22:56:17 UTC
In the above resource, this section:

```
    connectionState:
      address: installed-community-global-operators.openshift-operators.svc.cluster.local:50051
      lastConnect: 2019-08-28T16:11:11Z
      lastObservedState: TRANSIENT_FAILURE
```

was added from the linked PR. 

While I agree with you that we can do a better job and provide even more information, this BZ was opened to track this specific issue. I would consider this an additional feature request to consider implementing for 4.3.

What do you think?

Comment 8 Salvatore Colangelo 2019-08-29 08:23:03 UTC
LGTM

[scolange@scolange ~]$ oc exec catalog-operator-5db6468968-lllzr -- olm --version
OLM version: 0.11.0
git commit: 2959328fba0d1909ff9f24b365c2d8acbd3a19da
[scolange@scolange ~]$ oc exec olm-operator-5495dc9579-vf59l -- olm --version
OLM version: 0.11.0
git commit: 2959328fba0d1909ff9f24b365c2d8acbd3a19da

Comment 9 errata-xmlrpc 2019-10-16 06:34:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 10 Jian Zhang 2019-11-18 03:18:12 UTC
Evan,

Works for me, thanks!