*** Bug 1730908 has been marked as a duplicate of this bug. ***
After deploying the logging with 3 ES nodes using `WORKDIR=/home/vmasarik/.openshift TEST_LOGGING=false BUILD_IMAGES=true PUSH_IMAGES=true DEPLOY_CLUSTER=false DEPLOY_LOGGING=true USE_OLM=false USE_CUSTOM_IMAGES=true hack/get-cluster-run-tests.sh` there seems to be one pod whose hash in the name is shorter than others, example ``` elasticsearch-cdm-wqyx76al-1-5dd95865f-8bt6r 2/2 Running 0 12m elasticsearch-cdm-wqyx76al-2-5f748cf546-jv6r4 2/2 Running 0 74s elasticsearch-cdm-wqyx76al-3-79fcb65bd9-s8jxg 2/2 Running 0 14m ``` In this instance I was able to reproduce the fact that pods were not being recreated. Here is my command history. ``` 12951 oc describe po elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j Name: elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j Namespace: openshift-logging Priority: 0 PriorityClassName: <none> Node: ip-10-0-164-205.ec2.internal/10.0.164.205 Start Time: Thu, 29 Aug 2019 15:55:42 +0200 Labels: cluster-name=elasticsearch component=elasticsearch es-node-client=true es-node-data=true es-node-master=true node-name=elasticsearch-cdm-k8anfnxv-2 pod-template-hash=9bd4848d tuned.openshift.io/elasticsearch=true Annotations: openshift.io/scc: restricted Status: Running IP: 10.131.0.28 Controlled By: ReplicaSet/elasticsearch-cdm-k8anfnxv-2-9bd4848d Containers: elasticsearch: Container ID: cri-o://e8fe27929e72b92d32c9bdbdfeb753a9239b5820187f8581185a684ebaafb309 Image: image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5:latest Image ID: image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5@sha256:54a5df9a07f9d48e32c30d621e7fce062e4cb905275b2a30dae38700c351d6a0 Ports: 9300/TCP, 9200/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Thu, 29 Aug 2019 15:55:55 +0200 Ready: False Restart Count: 0 Limits: cpu: 500m memory: 4Gi Requests: cpu: 500m memory: 4Gi Readiness: exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3 Environment: DC_NAME: elasticsearch-cdm-k8anfnxv-2 NAMESPACE: openshift-logging (v1:metadata.namespace) KUBERNETES_TRUST_CERT: true SERVICE_DNS: elasticsearch-cluster CLUSTER_NAME: elasticsearch INSTANCE_RAM: 4Gi HEAP_DUMP_LOCATION: /elasticsearch/persistent/heapdump.hprof RECOVER_AFTER_TIME: 5m READINESS_PROBE_TIMEOUT: 30 POD_LABEL: cluster=elasticsearch IS_MASTER: true HAS_DATA: true Mounts: /elasticsearch/persistent from elasticsearch-storage (rw) /etc/openshift/elasticsearch/secret from certificates (rw) /usr/share/java/elasticsearch/config from elasticsearch-config (rw) /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-n2vpt (ro) proxy: Container ID: cri-o://7948624af558ebe0d6b01503107b203bbae5cd8211bbfa660b3438820720922b Image: registry.redhat.io/openshift4/ose-oauth-proxy:v4.1.13-201908210601 Image ID: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:434359f9a7e6aad4a049ffb862fbad5df2bda41c9ffe2fd1e82a49ad28df02f5 Port: 60000/TCP Host Port: 0/TCP Args: --https-address=:60000 --provider=openshift --upstream=https://127.0.0.1:9200 --tls-cert=/etc/proxy/secrets/tls.crt --tls-key=/etc/proxy/secrets/tls.key --upstream-ca=/etc/proxy/elasticsearch/admin-ca --openshift-service-account=elasticsearch -openshift-sar={"resource": "namespaces", "verb": "get"} -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}} --pass-user-bearer-token --cookie-secret=ija/R1W4NfluNTIuD+OqDA== State: Running Started: Thu, 29 Aug 2019 15:55:55 +0200 Ready: True Restart Count: 0 Environment: <none> Mounts: /etc/proxy/elasticsearch from certificates (rw) /etc/proxy/secrets from elasticsearch-metrics (rw) /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-n2vpt (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: elasticsearch-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: elasticsearch Optional: false elasticsearch-storage: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: certificates: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch Optional: false elasticsearch-metrics: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch-metrics Optional: false elasticsearch-token-n2vpt: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch-token-n2vpt Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 15m default-scheduler Successfully assigned openshift-logging/elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j to ip-10-0-164-205.ec2.internal Normal Pulling 15m kubelet, ip-10-0-164-205.ec2.internal Pulling image "image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5:latest" Normal Pulled 15m kubelet, ip-10-0-164-205.ec2.internal Successfully pulled image "image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5:latest" Normal Created 15m kubelet, ip-10-0-164-205.ec2.internal Created container elasticsearch Normal Started 15m kubelet, ip-10-0-164-205.ec2.internal Started container elasticsearch Normal Pulled 15m kubelet, ip-10-0-164-205.ec2.internal Container image "registry.redhat.io/openshift4/ose-oauth-proxy:v4.1.13-201908210601" already present on machine Normal Created 15m kubelet, ip-10-0-164-205.ec2.internal Created container proxy Normal Started 15m kubelet, ip-10-0-164-205.ec2.internal Started container proxy Warning Unhealthy 10m (x34 over 14m) kubelet, ip-10-0-164-205.ec2.internal Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 503] Warning Unhealthy 4m57s (x2 over 6m27s) kubelet, ip-10-0-164-205.ec2.internal Readiness probe errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Warning Unhealthy 22s (x12 over 15m) kubelet, ip-10-0-164-205.ec2.internal Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000] 12953 oc delete po elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j # Pod was redeployed but has short hash as well. 12958 oc delete rs elasticsearch-cdm-k8anfnxv-2-9bd4848d # RS is not recreated ``` ESO logs: ``` time="2019-08-29T13:56:27Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000" time="2019-08-29T14:00:35Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart" time="2019-08-29T14:01:54Z" level=info msg="Waiting for cluster to complete recovery: red / green" time="2019-08-29T14:01:54Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-cdm-k8anfnxv-2: red / green" time="2019-08-29T14:04:02Z" level=info msg="Waiting for cluster to complete recovery: / green" time="2019-08-29T14:04:48Z" level=info msg="Waiting for cluster to complete recovery: / green" time="2019-08-29T14:05:35Z" level=info msg="Waiting for cluster to complete recovery: / green" time="2019-08-29T14:10:44Z" level=info msg="Waiting for cluster to complete recovery: / green" time="2019-08-29T14:10:55Z" level=warning msg="Unknown type for low: <nil>" time="2019-08-29T14:10:55Z" level=warning msg="Unknown type for high: <nil>" time="2019-08-29T14:11:20Z" level=info msg="Waiting for cluster to complete recovery: red / green" time="2019-08-29T14:11:39Z" level=info msg="Waiting for cluster to complete recovery: red / green" time="2019-08-29T14:12:07Z" level=info msg="Waiting for cluster to complete recovery: red / green" CLO logs: ``` time="2019-08-29T13:56:32Z" level=info msg="Watching logging.openshift.io/v1, ClusterLogging, openshift-logging, 5000000000" time="2019-08-29T13:56:38Z" level=info msg="Elasticsearch image change found, updating elasticsearch" time="2019-08-29T13:56:40Z" level=error msg="Unable to read file to get contents: open /tmp/_working_dir/kibana-proxy-oauth.secret: no such file or directory" time="2019-08-29T13:56:42Z" level=info msg="Kibana image(s) change found, updating \"kibana\"" time="2019-08-29T13:56:43Z" level=info msg="Updating status of Kibana for \"instance\"" time="2019-08-29T13:56:45Z" level=info msg="Curator image change found, updating \"curator\"" time="2019-08-29T13:56:50Z" level=info msg="Collector image change found, updating \"rsyslog\"" time="2019-08-29T13:56:51Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T13:57:04Z" level=info msg="Updating status of Kibana for \"instance\"" time="2019-08-29T13:57:11Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T13:57:30Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T13:57:43Z" level=info msg="Updating status of Kibana for \"instance\"" time="2019-08-29T13:57:50Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T13:58:03Z" level=info msg="Updating status of Kibana for \"instance\"" time="2019-08-29T13:58:09Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T13:58:17Z" level=info msg="Updating status of Elasticsearch" time="2019-08-29T13:59:07Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:00:24Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:00:32Z" level=info msg="Updating status of Elasticsearch" time="2019-08-29T14:00:44Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:01:03Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:01:23Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:01:42Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:04:05Z" level=info msg="Updating status of Elasticsearch" time="2019-08-29T14:04:43Z" level=info msg="Updating status of Elasticsearch" time="2019-08-29T14:08:27Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:08:35Z" level=info msg="Updating status of Elasticsearch" time="2019-08-29T14:08:47Z" level=info msg="Updating status of Rsyslog" time="2019-08-29T14:10:31Z" level=info msg="Updating status of Elasticsearch" ``` I deleted the short hash pod, it got recreated, but if I deleted any RS or any deployments of ES none of them were recreated. Since I did nothing, but deploy logging and delete the short hash pod and some RS/deployments, I thought I could reproduce it with setting the log level of CLO to debug. I redeployed the logging, changed the logging level. I again had 3 ES pods, one with short hash name. I deleted a pod with long hash name, and it got recreated. No interesting entries in CLO logs, just the starting 5 lines that are printed always (version and such). *SIGH* now that I think about it I never deleted any RS or deployment. So there goes my debugging, and this makes all the information fairly useless, sorry. Few other things I noticed. The ES pod is in `running 2/2` and it is immediately terminated without any error. Also, sometimes after starting the cluster, all the pods are terminated one by one and replaced. In this case there is no ES pod with short hash name, and if I delete one it is properly replaced by a new pod. ``` NAME READY STATUS RESTARTS AGE cluster-logging-operator-84fbcff567-ghbkr 1/1 Running 0 4m24s elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 1/2 Running 0 5m44s elasticsearch-cdm-uirubjv6-2-cdfddc54f-svv4p 1/2 Running 0 4m42s elasticsearch-cdm-uirubjv6-3-68b5c7857b-5l4pc 1/2 Running 0 2m7s kibana-59d58d4b49-6w8f4 2/2 Running 0 4m11s rsyslog-bwjvf 1/1 Running 0 3m11s rsyslog-c862w 1/1 Running 0 3m48s rsyslog-gc4nh 1/1 Running 0 3m27s rsyslog-s5vlz 1/1 Running 0 3m16s rsyslog-sn9g6 1/1 Running 0 3m31s rsyslog-sx6xj 1/1 Running 0 3m38s elasticsearch-cdm-uirubjv6-2-cdfddc54f-svv4p 2/2 Running 0 4m55s elasticsearch-cdm-uirubjv6-3-68b5c7857b-5l4pc 2/2 Running 0 2m20s elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 2/2 Running 0 6m elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 2/2 Terminating 0 6m17s elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m18s elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m18s elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m26s elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m26s elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 0/2 Pending 0 0s elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 0/2 Pending 0 0s elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 0/2 ContainerCreating 0 0s elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 1/2 Running 0 27s rsyslog-s5vlz 0/1 OOMKilled 0 4m27s rsyslog-s5vlz 1/1 Running 1 4m28s ```