Description of problem: After upgrade to 4.4.6 we see community and certified operators crashing. The liveness probe fails with kind: Event lastTimestamp: "2020-08-26T10:50:53Z" message: | Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s metadata: creationTimestamp: "2020-08-26T10:31:43Z" name: certified-operators-5bcb56768c-ccj2d.162ecacdee3fa4aa namespace: openshift-marketplace resourceVersion: "705436" selfLink: /api/v1/namespaces/openshift-marketplace/events/certified-operators-5bcb56768c-ccj2d.162ecacdee3fa4aa uid: a1a0ca25-6ca7-4714-9ace-de7e8b975ac6 reason: Unhealthy reportingComponent: "" reportingInstance: "" source: component: kubelet host: infra0.ao.example.com type: Warning certified-operators-5bcb56768c-64fxs 0/1 CrashLoopBackOff 27 118m certified-operators-cddd74b58-k86fv 0/1 Running 6 14m community-operators-698654bb96-zd4s6 0/1 CrashLoopBackOff 13 51m community-operators-786f694c8d-gl7bj 0/1 Running 6 14m marketplace-operator-7c4959c648-fwmn7 1/1 Running 0 15m redhat-marketplace-5874897f8f-527hz 1/1 Running 0 14m redhat-operators-7d877d5977-jp8wz 1/1 Running 0 14m Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 30m default-scheduler Successfully assigned openshift-marketplace/community-operators-7b96c7dd85-ssv4w to infra0.ao.example.com Normal Created 30m kubelet, infra0.ao.example.com Created container community-operators Normal Started 30m kubelet, infra0.ao.example.com Started container community-operators Warning Unhealthy 28m (x9 over 29m) kubelet, infra0.ao.example.com Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s Normal Pulled 28m (x2 over 30m) kubelet, infra0.ao.example.com Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:821853c24977f49986d51cf2a3756dc3d067fc3122c27ef60db9445f67d66c5c" already present on machine Normal Killing 28m kubelet, infra0.ao.example.com Container community-operators failed liveness probe, will be restarted Warning Unhealthy 28m kubelet, infra0.ao.example.com Readiness probe failed: Warning Unhealthy 4m53s (x81 over 29m) kubelet, infra0.ao.example.com Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s Warning BackOff 24s (x43 over 15m) kubelet, infra0.ao.example.com Back-off restarting failed container Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Pods in crashloopbackoff state after upgrade Expected results: Pods should be in running state. Additional info:
Marking as VERIFIED: OCP: 4.4.0-0.nightly-2020-09-26-084423 OLM version: 0.14.2 git commit: 6307c54ea472e772de9d421201ce5a1ef1f7413 oc get pod -n openshift-marketplace -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.spec.containers[*].readinessProbe.initialDelaySeconds}{"\n"}{end}' certified-operators-795fd8965-vrk6n -- 60 community-operators-7797c6cb7b-w9lrl -- 60 marketplace-operator-5bc68bdfb7-5f2nl -- qe-app-registry-6796f94cc8-9z9rq -- 60 redhat-marketplace-7d8dcfd6d8-mkn4d -- 60 redhat-operators-96c4d7745-pmfzv -- 60
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.4.27 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4063
Still this persists on 4.4.27 version: stack.hpecloud.org:/home/stack>oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.27 True False 21m Cluster version is 4.4.27 stack.hpecloud.org:/home/stack> stack.hpecloud.org:/home/stack>oc get po -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-78957bf87f-znxd8 1/1 Running 0 15m community-operators-6f45588956-89l9f 0/1 Running 4 11m community-operators-7957d59f7d-2868l 0/1 Running 5 11m marketplace-operator-5df598b96b-8xpgc 1/1 Running 0 49m redhat-marketplace-778757464b-nxwq4 1/1 Running 4 48m redhat-operators-5745dd5649-t996z 1/1 Running 3 48m stack.hpecloud.org:/home/stack> I already followed this article and changed the value, now "CrashLoopBackOff" meagerness is not there but pods never became ready. https://access.redhat.com/solutions/5388381
Now again I got "CrashLoopBackOff" error. stack.hpecloud.org:/home/stack>oc get po -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-78957bf87f-znxd8 1/1 Running 0 28m community-operators-6f45588956-89l9f 0/1 CrashLoopBackOff 7 24m community-operators-7957d59f7d-2868l 0/1 CrashLoopBackOff 8 24m marketplace-operator-5df598b96b-8xpgc 1/1 Running 0 61m redhat-marketplace-778757464b-nxwq4 1/1 Running 4 61m redhat-operators-5745dd5649-t996z 1/1 Running 3 61m stack.hpecloud.org:/home/stack> [2] 0:stack@undercloud:~*
Arunabha, Can you get the logs related to the crashlooping community-operators pod? Additionally, can you `oc describe` the pod?
(ocp2) centos@bastion2:/home/centos>oc describe po certified-operators-7894cc7667-gp98t -n openshift-marketplace Name: certified-operators-7894cc7667-gp98t Namespace: openshift-marketplace Priority: 0 Node: ocp2-m4ndr-worker-qhqfc/10.0.1.156 Start Time: Mon, 19 Oct 2020 16:37:34 +0000 Labels: marketplace.operatorSource=certified-operators pod-template-hash=7894cc7667 Annotations: k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.46" ], "dns": {}, "default-route": [ "10.131.0.1" ] }] openshift-marketplace-update-hash: 163f721aa3b1c8b3 openshift.io/scc: restricted Status: Running IP: 10.131.0.46 IPs: IP: 10.131.0.46 Controlled By: ReplicaSet/certified-operators-7894cc7667 Containers: certified-operators: Container ID: cri-o://b5d868929528318532d4b694de19e6989a4886ed17096e43100d4d7b8b1510e8 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d Port: 50051/TCP Host Port: 0/TCP Command: appregistry-server -r https://quay.io/cnr|certified-operators -o storageos,cortex-operator,ibm-management-ingress-operator-app,ibm-helm-repo-operator-app,newrelic-infrastructure,ibm-auditlogging-operator-app,crunchy-postgres-operator,sematext,gitlab-operator,aci-containers-operator,tf-operator,memql-certified,t8c-certified,nginx-ingress-operator,cyberarmor-operator-certified,vprotect-operator,neuvector-certified-operator,redhat-marketplace-operator,splunk-certified,anzograph-operator,nsx-container-plugin-operator,redis-enterprise-operator-cert,seldon-deploy-operator,tidb-operator-certified,kong,couchdb-operator-certified,ibm-spectrum-symphony-operator,eddi-operator-certified,tigera-operator,openshiftxray-operator,cic-operator-with-crds,can-operator,driverlessai-deployment-operator-certified,portshift-operator,here-service-operator-certified,openshiftartifactoryha-operator,gpu-operator-certified,hpe-csi-operator,atomicorp-helm-operator-certified,node-red-operator-certified,linstor-operator,orca,federatorai-certified,citrix-adc-istio-ingress-gateway-operator,open-liberty-certified,vfunction-server-operator,aws-event-sources-operator-certified,hazelcast-enterprise-certified,f5-bigip-ctlr-operator,zoperator,kubemq-operator-marketplace,cortex-hub-operator,percona-server-mongodb-operator-certified,traefikee-certified,mongodb-enterprise-advanced-ibm,k10-kasten-operator,percona-xtradb-cluster-operator-certified,presto-operator,insightedge-enterprise-operator2,rapidbiz-operator-certified,triggermesh-operator,transform-adv-operator,mongodb-enterprise,cic-operator,k8s-triliovault,xcrypt-operator,oneagent-certified,anchore-engine,citrix-cpx-istio-sidecar-injector-operator,robin-operator,openunison-ocp-certified,instana-agent,appranix-cps,datadog-operator-certified,hazelcast-jet-enterprise-operator,infinibox-operator-certified,ibm-mongodb-operator-app,cockroachdb-certified,appsody-operator-certified,ubix-operator,akka-cluster-operator-certified,ibm-monitoring-grafana-operator-app,cnvrg-operator-marketplace,ch-appliance-operator,cloud-native-postgresql,uma-operator,cortex-fabric-operator,cass-operator,portshift-controller-operator,fep-helm-operator-certified,planetscale-certified,aqua-operator-certified,nxrm-operator-certified,cih-operator-certified,cert-manager-operator,universalagent-operator-certified,portworx-certified,ibm-spectrum-scale-csi,zabbix-operator-certified,anaconda-team-edition,hspc-operator,stonebranch-universalagent-operator-certified,appdynamics-operator,kubeturbo-certified,cpx-cic-operator,ibm-block-csi-operator,wavefront-operator,kong-offline-operator,kube-arangodb,yugabyte-operator,dell-csi-operator-certified,sysdig-certified,seldon-operator-certified,cortex-certifai-operator,kubemq-operator,ibm-helm-api-operator-app,nxiq-operator-certified,fp-predict-plus-operator-certified,storageos2,couchbase-enterprise-certified,synopsys-certified,joget-openshift-operator,open-enterprise-spinnaker,runtime-component-operator-certified,timemachine-operator,nuodb-ce-certified,joget-dx-operator,nastel-navigator-operator-certified,traefikee-redhat-certified,rocketchat-operator-certified,cortex-healthcare-hub-operator,growth-stack-operator-certified,twistlock-certified,perceptilabs-operator-package,alcide-kaudit-operator,falco-certified,densify-operator,armo-operator-certified,ivory-server-app,ocean-operator,ibm-platform-api-operator-app,coralogix-operator-certified State: Running Started: Mon, 19 Oct 2020 17:37:43 +0000 Last State: Terminated Reason: Error Exit Code: 2 Started: Mon, 19 Oct 2020 17:35:07 +0000 Finished: Mon, 19 Oct 2020 17:37:41 +0000 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 100Mi Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=60s timeout=1s period=10s #success=1 #failure=10 Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=120s timeout=1s period=10s #success=1 #failure=10 Environment: HTTP_PROXY: HTTPS_PROXY: NO_PROXY: Mounts: /etc/pki/ca-trust/extracted/pem/ from marketplace-trusted-ca (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-rstgp (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: marketplace-trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: marketplace-trusted-ca Optional: false default-token-rstgp: Type: Secret (a volume populated by a Secret) SecretName: default-token-rstgp Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 61m default-scheduler Successfully assigned openshift-marketplace/certified-operators-7894cc7667-gp98t to ocp2-m4ndr-worker-qhqfc Warning Unhealthy 59m (x3 over 59m) kubelet, ocp2-m4ndr-worker-qhqfc Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s Normal Killing 59m kubelet, ocp2-m4ndr-worker-qhqfc Container certified-operators failed liveness probe, will be restarted Normal Pulled 59m (x2 over 61m) kubelet, ocp2-m4ndr-worker-qhqfc Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d" already present on machine Normal Created 59m (x2 over 61m) kubelet, ocp2-m4ndr-worker-qhqfc Created container certified-operators Normal Started 59m (x2 over 61m) kubelet, ocp2-m4ndr-worker-qhqfc Started container certified-operators Warning BackOff 6m43s (x87 over 40m) kubelet, ocp2-m4ndr-worker-qhqfc Back-off restarting failed container Warning Unhealthy 110s (x148 over 60m) kubelet, ocp2-m4ndr-worker-qhqfc Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s (ocp2) centos@bastion2:/home/centos>
(ocp2) centos@bastion2:/home/centos>oc describe po community-operators-5cd868bdd-4vbvk -n openshift-marketplace Name: community-operators-5cd868bdd-4vbvk Namespace: openshift-marketplace Priority: 0 Node: ocp2-m4ndr-worker-qhqfc/10.0.1.156 Start Time: Mon, 19 Oct 2020 16:37:40 +0000 Labels: marketplace.operatorSource=community-operators pod-template-hash=5cd868bdd Annotations: k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.47" ], "dns": {}, "default-route": [ "10.131.0.1" ] }] openshift-marketplace-update-hash: 163f721bea95d529 openshift.io/scc: restricted Status: Running IP: 10.131.0.47 IPs: IP: 10.131.0.47 Controlled By: ReplicaSet/community-operators-5cd868bdd Containers: community-operators: Container ID: cri-o://25de23c7ac5b99de32bc8e1ddcf02b5d5f3be3904ba998a5a98854320179d61d Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d Port: 50051/TCP Host Port: 0/TCP Command: appregistry-server -r https://quay.io/cnr|community-operators -o ember-csi-operator,federation,jenkins-operator,sysflow-operator,reportportal-operator,codeready-toolchain-operator,cost-mgmt-operator,openebs,kubeturbo,hazelcast-jet-operator,percona-xtradb-cluster-operator,ibmcloud-iam-operator,openshift-qiskit-operator,postgresql,microsegmentation-operator,oadp-operator,opsmx-spinnaker-operator,podium-operator-bundle,horreum-operator,redis-operator,cert-utils-operator,crossplane,group-sync-operator,klusterlet,azure-service-operator,gitops-operator,maistraoperator,postgresql-operator-dev4devs-com,portworx-essentials,ripsaw,kubernetes-imagepuller-operator,t8c,snyk-operator,metering,awss3-operator-registry,myvirtualdirectory,prisma-cloud-compute-console-operator,sealed-secrets-operator-helm,apicurio-registry,prometheus,ibmcloud-operator,submariner,api-operator,dell-csi-operator,strimzi-kafka-operator,knative-kafka-operator,seldon-operator,hive-operator,hazelcast-operator,tidb-operator,planetscale,atlasmap-operator,eclipse-che,aws-efs-operator,apicast-community-operator,nexus-operator-m88i,hyperfoil-bundle,global-load-balancer-operator,knative-camel-operator,radanalytics-spark,service-binding-operator,percona-server-mongodb-operator,kogito-operator,ditto-operator,mcad-operator,microcks,event-streams-topic,traefikee-operator,argocd-operator,lib-bucket-provisioner,egressip-ipam-operator,iot-simulator,keycloak-operator,starter-kit-operator,jupyterlab-operator,cockroachdb,resource-locker-operator,keepalived-operator,elastic-cloud-eck,grafana-operator,hawtio-operator,hawkbit-operator,jaeger,spinnaker-operator,lightbend-console-operator,multicluster-operators-subscription,eunomia,prometheus-exporter-operator,openshift-nfd-operator,opendatahub-operator,nsm-operator-registry,teiid,must-gather-operator,enc-key-sync,federatorai,apicurito,enmasse,composable-operator,ibm-block-csi-operator-community,ibm-quantum-operator,node-problem-detector,namespace-configuration-operator,3scale-community-operator,special-resource-operator,keda,datadog-operator,spark-gcp,syndesis,splunk,pystol,cluster-manager,openshift-ibm-quantum-operator,camel-k,ibm-spectrum-scale-csi-operator,neuvector-community-operator,konveyor-operator,buildv2-operator,snapscheduler,infinispan,kubefed,kubestone,wso2am-operator,ham-deploy,aqua,kiali,esindex-operator,argocd-operator-helm,skydive-operator,etcd,akka-cluster-operator State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Mon, 19 Oct 2020 17:38:07 +0000 Finished: Mon, 19 Oct 2020 17:40:43 +0000 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 100Mi Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=60s timeout=1s period=10s #success=1 #failure=10 Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=120s timeout=1s period=10s #success=1 #failure=10 Environment: HTTP_PROXY: HTTPS_PROXY: NO_PROXY: Mounts: /etc/pki/ca-trust/extracted/pem/ from marketplace-trusted-ca (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-rstgp (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: marketplace-trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: marketplace-trusted-ca Optional: false default-token-rstgp: Type: Secret (a volume populated by a Secret) SecretName: default-token-rstgp Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 64m default-scheduler Successfully assigned openshift-marketplace/community-operators-5cd868bdd-4vbvk to ocp2-m4ndr-worker-qhqfc Normal Killing 61m kubelet, ocp2-m4ndr-worker-qhqfc Container community-operators failed liveness probe, will be restarted Normal Pulled 61m (x2 over 63m) kubelet, ocp2-m4ndr-worker-qhqfc Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d" already present on machine Normal Created 61m (x2 over 63m) kubelet, ocp2-m4ndr-worker-qhqfc Created container community-operators Normal Started 61m (x2 over 63m) kubelet, ocp2-m4ndr-worker-qhqfc Started container community-operators Warning Unhealthy 23m (x87 over 62m) kubelet, ocp2-m4ndr-worker-qhqfc Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s Warning Unhealthy 12m (x31 over 62m) kubelet, ocp2-m4ndr-worker-qhqfc Liveness probe failed: command timed out Warning BackOff 8m51s (x87 over 42m) kubelet, ocp2-m4ndr-worker-qhqfc Back-off restarting failed container Warning Unhealthy 3m50s (x58 over 61m) kubelet, ocp2-m4ndr-worker-qhqfc Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s (ocp2) centos@bastion2:/home/centos>
(ocp2) centos@bastion2:/home/centos>oc describe po certified-operators-7894cc7667-gp98t -n openshift-marketplace Name: certified-operators-7894cc7667-gp98t Namespace: openshift-marketplace Priority: 0 Node: ocp2-m4ndr-worker-qhqfc/10.0.1.156 Start Time: Mon, 19 Oct 2020 16:37:34 +0000 Labels: marketplace.operatorSource=certified-operators pod-template-hash=7894cc7667 Annotations: k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.0.46" ], "dns": {}, "default-route": [ "10.131.0.1" ] }] openshift-marketplace-update-hash: 163f721aa3b1c8b3 openshift.io/scc: restricted Status: Running IP: 10.131.0.46 IPs: IP: 10.131.0.46 Controlled By: ReplicaSet/certified-operators-7894cc7667 Containers: certified-operators: Container ID: cri-o://b5d868929528318532d4b694de19e6989a4886ed17096e43100d4d7b8b1510e8 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d Port: 50051/TCP Host Port: 0/TCP Command: appregistry-server -r https://quay.io/cnr|certified-operators -o storageos,cortex-operator,ibm-management-ingress-operator-app,ibm-helm-repo-operator-app,newrelic-infrastructure,ibm-auditlogging-operator-app,crunchy-postgres-operator,sematext,gitlab-operator,aci-containers-operator,tf-operator,memql-certified,t8c-certified,nginx-ingress-operator,cyberarmor-operator-certified,vprotect-operator,neuvector-certified-operator,redhat-marketplace-operator,splunk-certified,anzograph-operator,nsx-container-plugin-operator,redis-enterprise-operator-cert,seldon-deploy-operator,tidb-operator-certified,kong,couchdb-operator-certified,ibm-spectrum-symphony-operator,eddi-operator-certified,tigera-operator,openshiftxray-operator,cic-operator-with-crds,can-operator,driverlessai-deployment-operator-certified,portshift-operator,here-service-operator-certified,openshiftartifactoryha-operator,gpu-operator-certified,hpe-csi-operator,atomicorp-helm-operator-certified,node-red-operator-certified,linstor-operator,orca,federatorai-certified,citrix-adc-istio-ingress-gateway-operator,open-liberty-certified,vfunction-server-operator,aws-event-sources-operator-certified,hazelcast-enterprise-certified,f5-bigip-ctlr-operator,zoperator,kubemq-operator-marketplace,cortex-hub-operator,percona-server-mongodb-operator-certified,traefikee-certified,mongodb-enterprise-advanced-ibm,k10-kasten-operator,percona-xtradb-cluster-operator-certified,presto-operator,insightedge-enterprise-operator2,rapidbiz-operator-certified,triggermesh-operator,transform-adv-operator,mongodb-enterprise,cic-operator,k8s-triliovault,xcrypt-operator,oneagent-certified,anchore-engine,citrix-cpx-istio-sidecar-injector-operator,robin-operator,openunison-ocp-certified,instana-agent,appranix-cps,datadog-operator-certified,hazelcast-jet-enterprise-operator,infinibox-operator-certified,ibm-mongodb-operator-app,cockroachdb-certified,appsody-operator-certified,ubix-operator,akka-cluster-operator-certified,ibm-monitoring-grafana-operator-app,cnvrg-operator-marketplace,ch-appliance-operator,cloud-native-postgresql,uma-operator,cortex-fabric-operator,cass-operator,portshift-controller-operator,fep-helm-operator-certified,planetscale-certified,aqua-operator-certified,nxrm-operator-certified,cih-operator-certified,cert-manager-operator,universalagent-operator-certified,portworx-certified,ibm-spectrum-scale-csi,zabbix-operator-certified,anaconda-team-edition,hspc-operator,stonebranch-universalagent-operator-certified,appdynamics-operator,kubeturbo-certified,cpx-cic-operator,ibm-block-csi-operator,wavefront-operator,kong-offline-operator,kube-arangodb,yugabyte-operator,dell-csi-operator-certified,sysdig-certified,seldon-operator-certified,cortex-certifai-operator,kubemq-operator,ibm-helm-api-operator-app,nxiq-operator-certified,fp-predict-plus-operator-certified,storageos2,couchbase-enterprise-certified,synopsys-certified,joget-openshift-operator,open-enterprise-spinnaker,runtime-component-operator-certified,timemachine-operator,nuodb-ce-certified,joget-dx-operator,nastel-navigator-operator-certified,traefikee-redhat-certified,rocketchat-operator-certified,cortex-healthcare-hub-operator,growth-stack-operator-certified,twistlock-certified,perceptilabs-operator-package,alcide-kaudit-operator,falco-certified,densify-operator,armo-operator-certified,ivory-server-app,ocean-operator,ibm-platform-api-operator-app,coralogix-operator-certified State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Mon, 19 Oct 2020 17:37:43 +0000 Finished: Mon, 19 Oct 2020 17:40:21 +0000 Ready: False Restart Count: 15 Requests: cpu: 10m memory: 100Mi Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=60s timeout=1s period=10s #success=1 #failure=10 Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=120s timeout=1s period=10s #success=1 #failure=10 Environment: HTTP_PROXY: HTTPS_PROXY: NO_PROXY: Mounts: /etc/pki/ca-trust/extracted/pem/ from marketplace-trusted-ca (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-rstgp (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: marketplace-trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: marketplace-trusted-ca Optional: false default-token-rstgp: Type: Secret (a volume populated by a Secret) SecretName: default-token-rstgp Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 65m default-scheduler Successfully assigned openshift-marketplace/certified-operators-7894cc7667-gp98t to ocp2-m4ndr-worker-qhqfc Warning Unhealthy 62m (x3 over 63m) kubelet, ocp2-m4ndr-worker-qhqfc Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s Normal Killing 62m kubelet, ocp2-m4ndr-worker-qhqfc Container certified-operators failed liveness probe, will be restarted Normal Pulled 62m (x2 over 65m) kubelet, ocp2-m4ndr-worker-qhqfc Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7eb4e13f22a6234c70e02203e5c9adebf9fc7088b9e2d11b7a5cc614f90f413d" already present on machine Normal Created 62m (x2 over 65m) kubelet, ocp2-m4ndr-worker-qhqfc Created container certified-operators Normal Started 62m (x2 over 65m) kubelet, ocp2-m4ndr-worker-qhqfc Started container certified-operators Warning Unhealthy 5m12s (x148 over 64m) kubelet, ocp2-m4ndr-worker-qhqfc Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s Warning BackOff 11s (x109 over 43m) kubelet, ocp2-m4ndr-worker-qhqfc Back-off restarting failed container (ocp2) centos@bastion2:/home/centos>
Please let me know if we can increase the timeout value? Currently it "1s" only. Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=60s timeout=1s period=10s #success=1 #failure=10 Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=120s timeout=1s period=10s #success=1 #failure=10
I spun up a 4.4.27 cluster and could not reproduce the issue. There are multiple reasons this pod could crash loop, and the original cases attached to this BZ have been resolved. If you can still reproduce, can you please open a new BZ that includes the logs from this pod (if available?)