Created attachment 1624012 [details] CatalogSource Description of problem: The catalog-operator pod is constantly reporting failed healthcheck attempts against catalog registries Version-Release number of selected component (if applicable): v4.1.18 How reproducible: Appears consistently in v4.1.18 cluster for different CatalogSource objects Steps to Reproduce: 1. Deploy attached CatalogSource into a cluster 2. Watch catalog-operator pod logs Actual results: The catalog appears functional, and responds as expected to GRPC queries: oc run grpcurl-query -n openshift-operators --rm=true --restart=Never --attach=true --image=quay.io/rogbas/grpcurl -- -plaintext prometheus-catalog-registry:50051 api.Registry/ListPackages { "name": "prometheus" } But the catalog-operator pod is constantly outputting messages like: time="2019-10-09T19:25:15Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{prometheus-catalog-registry openshift-operators}" id=hSMFu source=prometheus-catalog-registry time="2019-10-09T19:25:23Z" level=info msg="building connection to registry" currentSource="{prometheus-catalog-registry openshift-operators}" id=31yeh source=prometheus-catalog-registry time="2019-10-09T19:25:23Z" level=info msg="client hasn't yet become healthy, attempt a health check" currentSource="{prometheus-catalog-registry openshift-operators}" id=31yeh source=prometheus-catalog-registry time="2019-10-09T19:25:29Z" level=info msg="building connection to registry" currentSource="{prometheus-catalog-registry openshift-operators}" id=El6De source=prometheus-catalog-registry
I am looking into this now.
(In reply to Rogerio Bastos from comment #0) > Created attachment 1624012 [details] > CatalogSource > > Description of problem: > > The catalog-operator pod is constantly reporting failed healthcheck attempts > against catalog registries > > Version-Release number of selected component (if applicable): > v4.1.18 > > How reproducible: > Appears consistently in v4.1.18 cluster for different CatalogSource objects > > Steps to Reproduce: > 1. Deploy attached CatalogSource into a cluster > 2. Watch catalog-operator pod logs > > > Actual results: > > The catalog appears functional, and responds as expected to GRPC queries: > > oc run grpcurl-query -n openshift-operators --rm=true --restart=Never > --attach=true --image=quay.io/rogbas/grpcurl -- -plaintext > prometheus-catalog-registry:50051 api.Registry/ListPackages > > { > "name": "prometheus" > } > > > But the catalog-operator pod is constantly outputting messages like: > > time="2019-10-09T19:25:15Z" level=info msg="client hasn't yet become > healthy, attempt a health check" currentSource="{prometheus-catalog-registry > openshift-operators}" id=hSMFu source=prometheus-catalog-registry > time="2019-10-09T19:25:23Z" level=info msg="building connection to registry" > currentSource="{prometheus-catalog-registry openshift-operators}" id=31yeh > source=prometheus-catalog-registry > time="2019-10-09T19:25:23Z" level=info msg="client hasn't yet become > healthy, attempt a health check" currentSource="{prometheus-catalog-registry > openshift-operators}" id=31yeh source=prometheus-catalog-registry > time="2019-10-09T19:25:29Z" level=info msg="building connection to registry" > currentSource="{prometheus-catalog-registry openshift-operators}" id=El6De > source=prometheus-catalog-registry Hello Rogerio, I was unable to reproduce the behavior you described after deploying a 4.1.18 cluster as shown below: (resource-limits → origin {16} ✓) operator-lifecycle-manager oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.1.18 True False False 24m ... ... ... marketplace 4.1.18 True False False 29m ... ... ... operator-lifecycle-manager 4.1.18 True False False 33m operator-lifecycle-manager-catalog 4.1.18 True False False 33m $ oc get subscriptions --all-namespaces NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-marketplace prometheus prometheus installed-community-openshift-marketplace beta openshift-operator-lifecycle-manager packageserver packageserver olm-operators alpha openshift-operators amq-streams amq-streams installed-redhat-openshift-operators stable # The AMQ-Streams operator in the openshift-operators namespace... $ oc get pod -n openshift-operators NAME READY STATUS RESTARTS AGE amq-streams-cluster-operator-7b6558fdc6-l895c 1/1 Running 0 3m32s # The Prometheus operator in the openshift-marketplace namespace... $ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-6db694488c-fdn9d 1/1 Running 0 30m community-operators-5494945db9-hftzp 1/1 Running 0 30m installed-community-openshift-marketplace-7f69d49697-5kh52 1/1 Running 0 92s installed-community-openshift-operators-6b5ffd988f-qgbzk 1/1 Running 0 10m installed-redhat-openshift-operators-77db67777b-9pt6q 1/1 Running 0 4m4s marketplace-operator-8459dc96dd-w9zsj 1/1 Running 0 31m prometheus-operator-b74d786b4-pdwtr 1/1 Running 0 66s redhat-operators-789df5478c-p6qlv 1/1 Running 0 30m How are you building your CatalogSource object?
Moving to 4.3 as this is not release blocking for 4.2. We will continue to try to reproduce there and backport any applicable fixes to z-stream releases.
As requested, the catalog image is being built with the following structure: manifests ├── 0.32.0 │ ├── prometheus.alertmanager.crd.yaml │ ├── prometheus.csv.yaml │ ├── prometheus.podmonitors.crd.yaml │ ├── prometheus.prometheus.crd.yaml │ ├── prometheus.prometheusrule.crd.yaml │ └── prometheus.servicemonitor.crd.yaml └── prometheus.package.yaml ...and using the following Dockerfile: FROM quay.io/openshift/origin-operator-registry:latest ARG SRC_BUNDLES COPY ${SRC_BUNDLES} manifests RUN initializer CMD ["registry-server", "-t", "/tmp/terminate.log"]
Hello @rbastos, The attachment you provided was a subscription, not a catalogSource. In an effort to reproduce your issues, I recreated the manifest dir and the image using the Dockerfile you provided and the following manifest files: https://github.com/operator-framework/community-operators/tree/master/community-operators/prometheus I then created the following catalogSource: ``` apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: prometheus-catalog-registry namespace: olm spec: displayName: Prometheus Catalog Source image: quay.io/agreene/catalog-operator:latest publisher: OperatorHub.io sourceType: grpc ``` After creating the CatalogSource, the Prometheus operator was deployed successfully: ``` $ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-74f5d45f65-89clr 1/1 Running 0 83m olm-operator-6c9b6c5c9d-xbn5s 1/1 Running 0 83m operatorhubio-catalog-b28vt 1/1 Running 0 83m packageserver-5f4d84f757-q9srv 1/1 Running 0 83m prometheus-catalog-registry-ldqgg 1/1 Running 0 3m15s prometheus-operator-6df4755cb4-gxv7d 1/1 Running 0 2m40s ``` I was not able to reproduce your issue, could you share your CatalogSource?
(In reply to Alexander Greene from comment #5) > Hello @rbastos, > > The attachment you provided was a subscription, not a catalogSource. In an > effort to reproduce your issues, I recreated the manifest dir and the image > using the Dockerfile you provided and the following manifest files: > https://github.com/operator-framework/community-operators/tree/master/ > community-operators/prometheus > > I then created the following catalogSource: > ``` > apiVersion: operators.coreos.com/v1alpha1 > kind: CatalogSource > metadata: > name: prometheus-catalog-registry > namespace: olm > spec: > displayName: Prometheus Catalog Source > image: quay.io/agreene/catalog-operator:latest > publisher: OperatorHub.io > sourceType: grpc > > ``` > > After creating the CatalogSource, the Prometheus operator was deployed > successfully: > ``` > $ oc get pods > NAME READY STATUS RESTARTS AGE > catalog-operator-74f5d45f65-89clr 1/1 Running 0 83m > olm-operator-6c9b6c5c9d-xbn5s 1/1 Running 0 83m > operatorhubio-catalog-b28vt 1/1 Running 0 83m > packageserver-5f4d84f757-q9srv 1/1 Running 0 83m > prometheus-catalog-registry-ldqgg 1/1 Running 0 3m15s > prometheus-operator-6df4755cb4-gxv7d 1/1 Running 0 2m40s > ``` > > I was not able to reproduce your issue, could you share your CatalogSource? Note: This was on a 4.3 cluster
Created attachment 1626604 [details] Catalog Source yaml file
I just added the CatalogSource yaml file as an attachment. Could you please confirm if you get the same error msg in the output of catalog-operator? Thanks a lot for testing
@Rogerio Bastos I appologize for the delay. This is not a bug. For multitenancy purposes, subscriptions can only pull from CatalogSource deployed in their same namespace UNLESS the the CatlogSource exists in a special "Global Catalog Source" namespace. You can view which namespace is marked as a global provided here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/manifests/0000_50_olm_08-catalog-operator.deployment.yaml#L28 As such, when you created your CatalogSource in the `openshift-operators` namespace and the subscription in the `prometheus` namespace, the logs you shared are generated. Your subscription and CatalogSource will work if you move the CatalogSource to the `openshift-marketplace` namespace OR move your subscription to the same namespace as your CatalogSource.