Bug 1847593

Summary: Operator source failings not reporting
Product: OpenShift Container Platform Reporter: Tom Buskey <tbuskey>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Tom Buskey <tbuskey>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: unspecified CC: bandrade, jiazha, krizza, kuiwang, scolange, tbuskey, yhui
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-24 06:05:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Tom Buskey 2020-06-16 16:28:52 UTC
Description of problem:
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-24596 Report default operator source failings to telemetry

If you edit the deployment of marketplace-operator to substitute the container to a known bad image, oc get opsrc should show failures.

It should further trigger alerts in prometheus

Version-Release number of selected component (if applicable):

oc version
Client Version: openshift-clients-4.6.0-202005210021-28-g711c56a65
Server Version: 4.5.0-0.nightly-2020-06-15-215359
Kubernetes Version: v1.18.3+91d0edd

How reproducible:

Steps to Reproduce:
1. Follow the above polarion.  The setup section has already been performed and doesn't need to be repeated.

Actual results:
All Status in oc get opsrc is Succeeded

Expected results:
A failure with a 404

Additional info:
Shared env used
OCP_4.5_Functional Test(Manual)_ UPI_vSphere 6.7_Restricted_HTTPS_Proxy_RHEL 7.7&RHCOS 4.5_Disk Encyption off_FIPS on_OpenShift-SDN (network policy)_IPv4_Etcd Encyption On_CRIO-1.18_Fluentd_Etcd-3.4.x_Google_File System_vSphere Disk_Object_NFS V4_overlay2_OVS-2.13

Comment 1 Kevin Rizza 2020-06-16 18:11:06 UTC
Hi Tom,

I'm not sure this is actually an issue, or if it is I would like to understand better. Right now, OperatorSources do not track the state of the registry pod. The OperatorSource generates a CatalogSource, and that object tracks the state of the pod. My expectation of these metrics is that they should point to the CatalogSource to understand the health of those pods.