Hello Shawn, Thanks for submitting this bug. I have only been able to reproduce the bug by clearing the status of the CR, which would place it in the initial state and incrementing the metric [1]. Doing so increases the count of the metric which is not fixed until the marketplace-operator pod is restarted. Could you share any information about the cluster? Have any non-default OperatorSources or CatalogSourceConfigs been installed on the cluster? Can you share the logs of the the marketplace operator pod? Thanks again! Ref: [1] https://github.com/operator-framework/operator-marketplace/blob/cb4ff45ccdddde154af4a8f9e7d24a72c78a346c/pkg/operatorsource/initial.go#L51
The cluster only has the 4 default OperatorSources $ oc get operatorsource --all-namespaces NAMESPACE NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGE AGE openshift-marketplace certified-operators appregistry https://quay.io/cnr certified-operators Certified Operators Red Hat Succeeded The object has been successfully reconciled 38m openshift-marketplace community-operators appregistry https://quay.io/cnr community-operators Community Operators Red Hat Succeeded The object has been successfully reconciled 38m openshift-marketplace redhat-marketplace appregistry https://quay.io/cnr redhat-marketplace Red Hat Marketplace Red Hat Succeeded The object has been successfully reconciled 38m openshift-marketplace redhat-operators appregistry https://quay.io/cnr redhat-operators Red Hat Operators Red Hat Succeeded The object has been successfully reconciled 38m And no CatalogSourceConfigs $ oc get catalogsourceconfigs --all-namespaces No resources found. Here is the marketplace operator pod log, with a grep for "Created" $ oc -n openshift-marketplace logs marketplace-operator-8694c7bf96-wb92h | grep Created time="2020-08-03T19:33:16Z" level=info msg="[status] Created ClusterOperator" time="2020-08-03T19:33:48Z" level=info msg="Created Deployment certified-operators with registry command: [appregistry-server -r https://quay.io/cnr|certified-operators -o cyberarmor-operator-certified,aci-containers-operator,vfunction-server-operator,appdynamics-operator,cortex-operator,ibm-monitoring-grafana-operator-app,cortex-hub-operator,universalagent-operator-certified,appsody-operator-certified,cortex-certifai-operator,nxrm-operator-certified,ibm-mongodb-operator-app,gpu-operator-certified,cortex-healthcare-hub-operator,aqua-certified,joget-openshift-operator,f5-bigip-ctlr-operator,portshift-operator,rocketchat-operator-certified,vprotect-operator,cic-operator-with-crds,robin-operator,anchore-engine,kube-arangodb,kubeturbo-marketplace-certified,transform-adv-operator,couchbase-enterprise-certified,aqua-operator-certified,ibm-auditlogging-operator-app,ibm-spectrum-scale-csi,redis-enterprise-operator-cert,synopsys-certified,dotscience-operator,nginx-ingress-operator,akka-cluster-operator-certified,ibm-licensing-operator-app,storageos-1tb,ibm-platform-api-operator-app,twistlock-certified,cpx-cic-operator,ivory-server-app,cass-operator,sysdig-certified,t8c-certified,cortex-fabric-operator,couchdb-operator-certified,hazelcast-jet-enterprise-operator,hspc-operator,planetscale-certified,tidb-operator-certified,eddi-operator-certified,driverlessai-deployment-operator-certified,insightedge-enterprise-operator2,mongodb-enterprise,storageos,cic-operator,ibm-block-csi-operator,yugabyte-operator,nxiq-operator-certified,ibm-helm-repo-operator-app,gitlab-operator,anaconda-team-edition,kong-offline-operator,traefikee-certified,ocean-operator,joget-dx-operator,seldon-operator-certified,rapidbiz-operator-certified,fp-predict-plus-operator-certified,federatorai-certified,presto-operator,kubemq-operator-marketplace,ubix-operator,ibm-management-ingress-operator-app,orca,falco-certified,cih-operator-certified,hpe-csi-operator,atomicorp-helm-operator-certified,oneagent-certified,citrix-adc-istio-ingress-gateway-operator,perceptilabs-operator-package,transadv-operator,runtime-component-operator-certified,infinibox-operator-certified,triggermesh-operator,aws-event-sources-operator,openunison-ocp-certified,kong,storageos-10tb,timemachine-operator,node-red-operator-certified,newrelic-infrastructure,ibm-helm-api-operator-app,xcrypt-operator,percona-server-mongodb-operator-certified,zabbix-operator-certified,alcide-kaudit-operator,neuvector-certified-operator,anzograph-operator,nuodb-ce-certified,kubeturbo-certified,appranix-cps,openshiftartifactoryha-operator,splunk-certified,open-liberty-certified,k8s-triliovault,memql-certified,traefikee-redhat-certified,ibm-spectrum-symphony-operator,linstor-operator,instana-agent,wavefront-operator,tigera-operator,kubemq-operator,tf-operator,percona-xtradb-cluster-operator-certified,cockroachdb-certified,uma-operator,open-enterprise-spinnaker,citrix-cpx-istio-sidecar-injector-operator,cert-manager-operator,openshiftxray-operator,cnvrg-operator,crunchy-postgres-operator,portworx-certified,datadog-operator-certified,redhat-marketplace-operator,hazelcast-enterprise-certified,sematext,here-service-operator-certified]" name=certified-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:48Z" level=info msg="Created Service certified-operators" name=certified-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:48Z" level=info msg="Created CatalogSource certified-operators" name=certified-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created Deployment redhat-operators with registry command: [appregistry-server -r https://quay.io/cnr|redhat-operators -o sriov-network-operator,apicast-operator,service-registry-operator,fuse-apicurito,fuse-online,local-storage-operator,clusterresourceoverride,quay-operator,container-security-operator,aws-ebs-csi-driver-operator,manila-csi-driver-operator,codeready-workspaces,businessautomation-operator,openshiftansibleservicebroker,amq7-cert-manager,amq7-interconnect-operator,3scale-operator,nfd,cluster-kube-descheduler-operator,amq-streams,amq-online,datagrid,servicemeshoperator,eap,ocs-operator,kubevirt-hyperconverged,quay-bridge-operator,red-hat-camel-k,rhsso-operator,amq-broker-rhel8,dv-operator,metering-ocp,kiali-ossm,jaeger-product,cam-operator,ptp-operator,performance-addon-operator,fuse-console,cluster-logging,openshifttemplateservicebroker,serverless-operator,amq-broker,amq-broker-lts,openshift-pipelines-operator-rh,advanced-cluster-management,vertical-pod-autoscaler,elasticsearch-operator]" name=redhat-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created Service redhat-operators" name=redhat-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created CatalogSource redhat-operators" name=redhat-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created Deployment community-operators with registry command: [appregistry-server -r https://quay.io/cnr|community-operators -o redis-operator,eunomia,ham-deploy,jaeger,myvirtualdirectory,seldon-operator,maistraoperator,ditto-operator,buildv2-operator,cockroachdb,atlasmap-operator,namespace-configuration-operator,awss3-operator-registry,spinnaker-operator,infinispan,cert-utils-operator,keycloak-operator,argocd-operator,nexus-operator-m88i,metering,ibmcloud-operator,podium-operator-bundle,sealed-secrets-operator-helm,akka-cluster-operator,iot-simulator,kubeturbo,teiid,apicast-community-operator,keda,starter-kit-operator,api-operator,kubernetes-imagepuller-operator,group-sync-operator,strimzi-kafka-operator,microsegmentation-operator,hyperfoil-bundle,ember-csi-operator,hazelcast-jet-operator,prometheus-exporter-operator,etcd,eclipse-che,lightbend-console-operator,kogito-operator,federatorai,3scale-community-operator,ibm-spectrum-scale-csi-operator,skydive-operator,knative-kafka-operator,t8c,special-resource-operator,multicluster-operators-subscription,prisma-cloud-compute-console-operator,radanalytics-spark,event-streams-topic,datadog-operator,node-problem-detector,postgresql,knative-camel-operator,grafana-operator,service-binding-operator,kubestone,jenkins-operator,sysflow-operator,pystol,descheduler,planetscale,aqua,opendatahub-operator,must-gather-operator,crossplane,traefikee-operator,keepalived-operator,horreum-operator,cost-mgmt-operator,prometheus,federation,opsmx-spinnaker-operator,openshift-pipelines-operator,resource-locker-operator,neuvector-community-operator,splunk,syndesis,enmasse,ripsaw,ibmcloud-iam-operator,camel-k,microcks,composable-operator,ibm-block-csi-operator-community,enc-key-sync,argocd-operator-helm,lib-bucket-provisioner,apicurio-registry,spark-gcp,kiali,aws-efs-operator,apicurito,kubefed,hazelcast-operator,percona-server-mongodb-operator,dell-csi-operator,codeready-toolchain-operator,snapscheduler,konveyor-operator,nsm-operator-registry,percona-xtradb-cluster-operator,elastic-cloud-eck,snyk-operator,egressip-ipam-operator,wso2am-operator,global-load-balancer-operator,openebs,submariner,hive-operator,esindex-operator,hawtio-operator,postgresql-operator-dev4devs-com]" name=community-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created Service community-operators" name=community-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created CatalogSource community-operators" name=community-operators namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created Deployment redhat-marketplace with registry command: [appregistry-server -r https://quay.io/cnr|redhat-marketplace -o here-service-operator-certified-rhmp,federatorai-certified-rhmp,openunison-ocp-certified-rhmp,rapidbiz-operator-certified-rhmp,cass-operator-rhmp,portshift-operator-rhmp,ibm-block-csi-operator-rhmp,hazelcast-enterprise-certified-rhmp,perceptilabs-operator-package-rhmp,kubemq-operator-marketplace-rhmp,uma-operator-rhmp,cockroachdb-certified-rhmp,xcrypt-operator-rhmp,robin-operator-rhmp,memql-certified-rhmp,neuvector-certified-operator-rhmp,cortex-fabric-operator-rhmp,orca-rhmp,instana-agent-rhmp,cyberarmor-operator-certified-rhmp,linstor-operator-rhmp,anzograph-operator-rhmp,open-enterprise-spinnaker-rhmp,crunchy-postgres-operator-rhmp,enterprise-operator-rhmp,couchbase-enterprise-certified-rhmp,appranix-cps-rhmp,akka-cluster-operator-certified-rhmp,seldon-operator-certified-rhmp,node-red-operator-certified-rhmp,tf-operator-rhmp,joget-dx-operator-rhmp,hazelcast-jet-enterprise-operator-rhmp,atomicorp-helm-operator-certified-rhmp,cert-manager-operator-rhmp,oneagent-certified-rhmp,ivory-server-app-rhmp,cpx-cic-operator-rhmp,timemachine-operator-rhmp,cic-operator-with-crds-rhmp,cnvrg-operator-certifyed-rhmp,joget-openshift-operator-rhmp,eddi-operator-certified-rhmp,presto-operator-rhmp,storageos-rhmp,vprotect-operator-rhmp,anaconda-team-edition-rhmp,cortex-certifai-operator-rhmp,insightedge-enterprise-operator2-rhmp,sysdig-certified-rhmp,fp-predict-plus-operator-certified-rhmp,kong-offline-operator-rhmp,nxiq-operator-certified-rhmp,nxrm-operator-certified-rhmp,aqua-operator-certified-rhmp,traefikee-redhat-certified-rhmp,zabbix-operator-certified-rhmp,k8s-triliovault-rhmp,vfunction-server-operator-rhmp,kubeturbo-certified-rhmp]" name=redhat-marketplace namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created Service redhat-marketplace" name=redhat-marketplace namespace=openshift-marketplace type=OperatorSource time="2020-08-03T19:33:49Z" level=info msg="Created CatalogSource redhat-marketplace" name=redhat-marketplace namespace=openshift-marketplace type=OperatorSource
Jones, Thank you for adding additional information. I have noticed a bug where the `custom_resource_definition` metric is inremented in the reconciler for the CatalogSourceConfig (CSC) resource [1]. This bug increments the metric whenever a CSC is reconciled and would persist after the CSC is deleted, even if no CSCs are present on cluster. This is the only way I have been able to reproduce the issue so far and may be the cause of the alert. I would like to see if the marketplace operator has ever reconciled a CSC. Sadly, Marketplace never prints "created" when reconciling a CSC. If possible, please attach the full set of marketplace logs. If this is not possible, instead of grepping the marketplace operator logs for "created", could you grep for "Reconciling CatalogSourceConfig" (printed here [2])? This will let me know if the marketplace operator ever reconciled a CSC, which would cause the issue mentioned above. If the logs indicate that no CSC was ever reconciled, I would like to know the value of the customResourceType label on the custom_resource_definition metric. This label's value will pinpoint which controller in Marketplace is incrementing the metric. You can query for this metric by visiting the OpenShift UI, opening the monitoring tab on the left hand dropdown, selecting the metrics option, and querying for custom_resource_metric in the search bar. I will attach a picture showing you where to look. I appreciate your assistance and I will do my best to resolve this issue in a timely matter. Thank you, Alex Ref: [1] https://github.com/operator-framework/operator-marketplace/pull/303/files [2] https://github.com/operator-framework/operator-marketplace/blob/34fd70b3c0782bc7eaeb752666a7614c30ccf545/pkg/controller/catalogsourceconfig/catalogsourceconfig_controller.go#L94
Created attachment 1710252 [details] Requested Metric Query
I should note that up to two timeseries may be present and the label should equal `CatalogSourceConfig` or `OperatorSource`.
Created attachment 1710349 [details] marketplace operator pod logs I've attached the full log of my marketplace operator pod.
A recent PR [1] removed the OperatorSources and CatalogSources from marketplace, this bug will no longer happen on 4.6. The fix originally introduced via the PR attached to this BZ will need to be backported to 4.5 and 4.4. Ref: [1] https://github.com/operator-framework/operator-marketplace/pull/323
Placing back on QE, I can't backport to 4.5 unless QE verifies this bug.
Verify it on 4.6. There is no opsrc and csc CRD. LGTM -- kuiwang@Kuis-MacBook-Pro 1862481 % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-13-091737 True False 6h57m Cluster version is 4.6.0-0.nightly-2020-08-13-091737 kuiwang@Kuis-MacBook-Pro 1862481 % oc get crd|grep -i operatorsource kuiwang@Kuis-MacBook-Pro 1862481 % oc get crd|grep -i catalogsourceconfig --
Hi all, what is the status of getting this fix backported to 4.4? Thanks.
A 4.5 backport [1] must merge first but is currently waiting for a patch manager to approve it. REF: [1] https://github.com/operator-framework/operator-marketplace/pull/324
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196