Description of problem: Try to enable disconnected community catalog on IPV6 restricted network. The catalog image running openshift-marketplace is constantly in crashloop or OOMKILL No operators are served and we cannot deploy commmunity operators in disconnected IPV6 environment. How reproducible: 1. went through the disconnected OLM procedure documented to enable community operators. 2. created the CatalogSource.yaml 3. applied using `oc apply -f CatalogSource.yaml` ``` --- apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: my-community-operator-catalog namespace: openshift-marketplace spec: sourceType: grpc image: registry.XXXX:5000/opcatalog/community-operators:v1 displayName: My ACM Community Operator Catalog publisher: grpc ``` Steps to Reproduce: 1. 2. 3. Actual results: ``` marketplace NAME READY STATUS RESTARTS AGE marketplace-operator-86c6c54c55-5cv6f 1/1 Running 0 84m my-community-operator-catalog-rk6sq 0/1 CrashLoopBackOff 9 24m catalogsource NAME DISPLAY TYPE PUBLISHER AGE my-community-operator-catalog My ACM Community Operator Catalog grpc grpc 69m packagemanifest No resources found in openshift-marketplace namespace. ``` Expected results: Running pod. Additional info:
Steps: 1. oc adm catalog build for community catalog 2. oc adm catalog mirror 3. oc apply -f community-operators-manifests 4. oc patch OperatorHub cluster --type json ... 5. oc apply -f catalogsource.yml
Please provide any logs from the crashlooping pod, any logs from the disconnected mirroring process, and, if possible, a copy of the image that is being referenced (registry.XXXX:5000/opcatalog/community-operators:v1)
pod log: [kni@r640-u01 ~]$ cat catalogimage.pod.log time="2020-03-18T12:26:32Z" level=info msg="serving registry" database=/bundles.db port=50051 pod describe: Name: my-community-operator-catalog-rk6sq Namespace: openshift-marketplace Priority: 0 Node: openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com/2620:52:0:1386::37 Start Time: Tue, 17 Mar 2020 19:03:39 -0400 Labels: olm.catalogSource=my-community-operator-catalog Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_address":"fd01::5:a006:cfff:fe00:3b/64","mac_address":"a2:06:cf:00:00:3b","gateway_ip":"fd01:0:0:5::1"}} k8s.v1.cni.cncf.io/networks-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "fd01::5:a006:cfff:fe00:3b" ], "mac": "a2:06:cf:00:00:3b", "dns": {} }] openshift.io/scc: privileged Status: Running IP: fd01::5:a006:cfff:fe00:3b IPs: IP: fd01::5:a006:cfff:fe00:3b Containers: registry-server: Container ID: cri-o://0fc56da80f5115b283cd977640714daf768e02c5aab2d4f60761d04ad99928dd Image: registry.qe1.kni.lab.eng.bos.redhat.com:5000/opcatalog/community-operators:v1 Image ID: registry.qe1.kni.lab.eng.bos.redhat.com:5000/opcatalog/community-operators@sha256:b017b2f8ddacc32d3d226449b0202d3bf5a85e9be2afa7b04af7148138bebd0f Port: 50051/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Wed, 18 Mar 2020 08:26:32 -0400 Finished: Wed, 18 Mar 2020 08:26:45 -0400 Ready: False Restart Count: 152 Limits: cpu: 100m memory: 100Mi Requests: cpu: 10m memory: 50Mi Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=5s period=10s #success=1 #failure=3 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-hzmcn (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-hzmcn: Type: Secret (a volume populated by a Secret) SecretName: default-token-hzmcn Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 138m (x12 over 13h) kubelet, openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com Liveness probe failed: Warning Unhealthy 83m (x14 over 13h) kubelet, openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com Readiness probe failed: command timed out Warning Unhealthy 33m (x19 over 13h) kubelet, openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com Readiness probe failed: Normal Pulled 28m (x148 over 13h) kubelet, openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com Container image "registry.qe1.kni.lab.eng.bos.redhat.com:5000/opcatalog/community-operators:v1" already present on machine Warning BackOff 3m14s (x3531 over 13h) kubelet, openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com Back-off restarting failed container
quay.io/cdoan/community-operators:v1 * since the image is created by the `oc adm catalog build...` not sure how valuable it is. * is there a way to limit the images that is included in the built catalog images?
sorry about that. repushed image to: docker pull dhubchris/community-operators:v1
In my case, I am seeing this is on disconnected IPv6 environment on OCP 4.3
There is currently a limit on the size of the pod allowed as a catalogsource - 100mb. This limit is exceeded by the community catalog, so the container is being killed by kube. There is currently a fix prepped and ready to merge, please see the linked PR. Once the PR merges, we will backport to 4.3 so that no workarounds are required. In the meantime, there is a somewhat straightforward workaround: 1. Create a Pod that points to the catalog image in the operator-marketplace namespace. In this example I am using the image from Constantin's cluster, you could also do this with dhubchris/community-operators:v1 kind: Pod apiVersion: v1 metadata: name: disconnected-operator-catalog-community-fixed namespace: openshift-marketplace labels: olm.catalogSource: disconnected-operator-catalog-community spec: nodeSelector: beta.kubernetes.io/os: linux restartPolicy: Always serviceAccountName: default imagePullSecrets: - name: default-dockercfg-nlhhd enableServiceLinks: true terminationGracePeriodSeconds: 30 containers: - resources: requests: cpu: 10m memory: 50Mi readinessProbe: exec: command: - grpc_health_probe - '-addr=localhost:50051' initialDelaySeconds: 5 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 terminationMessagePath: /dev/termination-log name: registry-server livenessProbe: exec: command: - grpc_health_probe - '-addr=localhost:50051' initialDelaySeconds: 10 timeoutSeconds: 1 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 ports: - name: grpc containerPort: 50051 protocol: TCP imagePullPolicy: IfNotPresent image: registry.ocp-edge-cluster-cdv2.qe.lab.redhat.com:5000/restricted_olm/community-operators:v1 serviceAccount: default tolerations: - operator: Exists 2. Create a CatalogSource that points to the address of the Pod you just created. This can be an ip address or a dns address, but must include the port (50051 by default): apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: disconnected-operator-catalog-community-fixed namespace: openshift-marketplace spec: address: '[fd01::1:5088:b7ff:fe00:28]:50051' displayName: Community Operators sourceType: grpc The Console UI does not let you create catalogs with the "address" field - you will either need to create with kubectl, or create a placeholder CatalogSource in the console and then edit it to remove the `image` field and use the `address` field instead. If done correctly, the status of the catalog source should indicate a successful connection to OLM and indicate the the connection is `READY`.
1, Create a 4.5 cluster that the fixed PR merged in. mac:~ jianzhang$ oc adm release info registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-03-20-071514 --commits |grep lifecycle operator-lifecycle-manager https://github.com/operator-framework/operator-lifecycle-manager a6162e46f31455d4f93b8215772b0dd8969652a0 mac:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-03-20-071514 True False 34m Cluster version is 4.5.0-0.nightly-2020-03-20-071514 2, Create a CatalogSource object with this "dhubchris/community-operators:v1" image. Its pod works well. mac:~ jianzhang$ cat cs-1805410.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: bug-operator namespace: openshift-marketplace spec: sourceType: grpc image: dhubchris/community-operators:v1 displayName: Bug Operators publisher: Red Hat mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE bug-operator-47b4c 1/1 Running 0 7m29s certified-operators-75cbb755dd-wls6q 1/1 Running 0 60m community-operators-5b6f745df-xbjt9 1/1 Running 0 60m marketplace-operator-778449d4dd-sq7mz 1/1 Running 0 61m redhat-marketplace-f98dbd4fb-twsqg 1/1 Running 0 60m redhat-operators-dd9bcff79-vnwwg 1/1 Running 0 60m mac:~ jianzhang$ oc get packagemanifest |grep -i bug microcks Bug Operators 7m58s postgresql-operator-dev4devs-com Bug Operators 7m58s ... 3, Check the Request CPU/Memory of this pod, no Limits. LGTM, verify it. mac:~ jianzhang$ oc get pods -n openshift-marketplace bug-operator-47b4c -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.2.8" ], "dns": {}, "default-route": [ "10.129.2.1" ] }] openshift.io/scc: anyuid creationTimestamp: "2020-03-20T11:25:49Z" generateName: bug-operator- labels: olm.catalogSource: bug-operator name: bug-operator-47b4c namespace: openshift-marketplace ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: CatalogSource name: bug-operator uid: aa7a762d-1658-42a4-8280-d79d8639e916 resourceVersion: "33285" selfLink: /api/v1/namespaces/openshift-marketplace/pods/bug-operator-47b4c uid: 57d7ebf1-6269-4c02-a5fc-495af8ed6af9 spec: containers: - image: dhubchris/community-operators:v1 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - grpc_health_probe - -addr=localhost:50051 failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: registry-server ports: - containerPort: 50051 name: grpc protocol: TCP readinessProbe: exec: command: - grpc_health_probe - -addr=localhost:50051 failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: requests: cpu: 10m memory: 50Mi securityContext: capabilities: drop: - MKNOD terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-hctfc readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: default-dockercfg-hfgrp nodeName: ip-10-0-143-45.us-west-2.compute.internal nodeSelector: beta.kubernetes.io/os: linux priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: seLinuxOptions: level: s0:c23,c7 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - operator: Exists volumes: - name: default-token-hctfc secret: defaultMode: 420 secretName: default-token-hctfc ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409