Bug 1746542
| Summary: | resources spec don't get updated by the cluster logging operator | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jeff Cantrill <jcantril> |
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> |
| Status: | CLOSED WONTFIX | QA Contact: | Anping Li <anli> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.0 | CC: | anli, aos-bugs, christoph.obexer, rmeggins, rspazzol, stwalter |
| Target Milestone: | --- | ||
| Target Release: | 4.1.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1732936 | Environment: | |
| Last Closed: | 2020-01-20 18:16:28 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1732936 | ||
| Bug Blocks: | |||
|
Comment 1
Jeff Cantrill
2019-08-28 19:26:33 UTC
After deploying the logging with 3 ES nodes using `WORKDIR=/home/vmasarik/.openshift TEST_LOGGING=false BUILD_IMAGES=true PUSH_IMAGES=true DEPLOY_CLUSTER=false DEPLOY_LOGGING=true USE_OLM=false USE_CUSTOM_IMAGES=true hack/get-cluster-run-tests.sh` there seems to be one pod whose hash in the name is shorter than others, example
```
elasticsearch-cdm-wqyx76al-1-5dd95865f-8bt6r 2/2 Running 0 12m
elasticsearch-cdm-wqyx76al-2-5f748cf546-jv6r4 2/2 Running 0 74s
elasticsearch-cdm-wqyx76al-3-79fcb65bd9-s8jxg 2/2 Running 0 14m
```
In this instance I was able to reproduce the fact that pods were not being recreated. Here is my command history.
```
12951 oc describe po elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j
Name: elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j
Namespace: openshift-logging
Priority: 0
PriorityClassName: <none>
Node: ip-10-0-164-205.ec2.internal/10.0.164.205
Start Time: Thu, 29 Aug 2019 15:55:42 +0200
Labels: cluster-name=elasticsearch
component=elasticsearch
es-node-client=true
es-node-data=true
es-node-master=true
node-name=elasticsearch-cdm-k8anfnxv-2
pod-template-hash=9bd4848d
tuned.openshift.io/elasticsearch=true
Annotations: openshift.io/scc: restricted
Status: Running
IP: 10.131.0.28
Controlled By: ReplicaSet/elasticsearch-cdm-k8anfnxv-2-9bd4848d
Containers:
elasticsearch:
Container ID: cri-o://e8fe27929e72b92d32c9bdbdfeb753a9239b5820187f8581185a684ebaafb309
Image: image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5:latest
Image ID: image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5@sha256:54a5df9a07f9d48e32c30d621e7fce062e4cb905275b2a30dae38700c351d6a0
Ports: 9300/TCP, 9200/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Thu, 29 Aug 2019 15:55:55 +0200
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 4Gi
Requests:
cpu: 500m
memory: 4Gi
Readiness: exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3
Environment:
DC_NAME: elasticsearch-cdm-k8anfnxv-2
NAMESPACE: openshift-logging (v1:metadata.namespace)
KUBERNETES_TRUST_CERT: true
SERVICE_DNS: elasticsearch-cluster
CLUSTER_NAME: elasticsearch
INSTANCE_RAM: 4Gi
HEAP_DUMP_LOCATION: /elasticsearch/persistent/heapdump.hprof
RECOVER_AFTER_TIME: 5m
READINESS_PROBE_TIMEOUT: 30
POD_LABEL: cluster=elasticsearch
IS_MASTER: true
HAS_DATA: true
Mounts:
/elasticsearch/persistent from elasticsearch-storage (rw)
/etc/openshift/elasticsearch/secret from certificates (rw)
/usr/share/java/elasticsearch/config from elasticsearch-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-n2vpt (ro)
proxy:
Container ID: cri-o://7948624af558ebe0d6b01503107b203bbae5cd8211bbfa660b3438820720922b
Image: registry.redhat.io/openshift4/ose-oauth-proxy:v4.1.13-201908210601
Image ID: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:434359f9a7e6aad4a049ffb862fbad5df2bda41c9ffe2fd1e82a49ad28df02f5
Port: 60000/TCP
Host Port: 0/TCP
Args:
--https-address=:60000
--provider=openshift
--upstream=https://127.0.0.1:9200
--tls-cert=/etc/proxy/secrets/tls.crt
--tls-key=/etc/proxy/secrets/tls.key
--upstream-ca=/etc/proxy/elasticsearch/admin-ca
--openshift-service-account=elasticsearch
-openshift-sar={"resource": "namespaces", "verb": "get"}
-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
--pass-user-bearer-token
--cookie-secret=ija/R1W4NfluNTIuD+OqDA==
State: Running
Started: Thu, 29 Aug 2019 15:55:55 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/proxy/elasticsearch from certificates (rw)
/etc/proxy/secrets from elasticsearch-metrics (rw)
/var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-n2vpt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: elasticsearch
Optional: false
elasticsearch-storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
certificates:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch
Optional: false
elasticsearch-metrics:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch-metrics
Optional: false
elasticsearch-token-n2vpt:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch-token-n2vpt
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned openshift-logging/elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j to ip-10-0-164-205.ec2.internal
Normal Pulling 15m kubelet, ip-10-0-164-205.ec2.internal Pulling image "image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5:latest"
Normal Pulled 15m kubelet, ip-10-0-164-205.ec2.internal Successfully pulled image "image-registry.openshift-image-registry.svc:5000/openshift/origin-logging-elasticsearch5:latest"
Normal Created 15m kubelet, ip-10-0-164-205.ec2.internal Created container elasticsearch
Normal Started 15m kubelet, ip-10-0-164-205.ec2.internal Started container elasticsearch
Normal Pulled 15m kubelet, ip-10-0-164-205.ec2.internal Container image "registry.redhat.io/openshift4/ose-oauth-proxy:v4.1.13-201908210601" already present on machine
Normal Created 15m kubelet, ip-10-0-164-205.ec2.internal Created container proxy
Normal Started 15m kubelet, ip-10-0-164-205.ec2.internal Started container proxy
Warning Unhealthy 10m (x34 over 14m) kubelet, ip-10-0-164-205.ec2.internal Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 503]
Warning Unhealthy 4m57s (x2 over 6m27s) kubelet, ip-10-0-164-205.ec2.internal Readiness probe errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1
Warning Unhealthy 22s (x12 over 15m) kubelet, ip-10-0-164-205.ec2.internal Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]
12953 oc delete po elasticsearch-cdm-k8anfnxv-2-9bd4848d-2cv5j
# Pod was redeployed but has short hash as well.
12958 oc delete rs elasticsearch-cdm-k8anfnxv-2-9bd4848d
# RS is not recreated
```
ESO logs:
```
time="2019-08-29T13:56:27Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000"
time="2019-08-29T14:00:35Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-29T14:01:54Z" level=info msg="Waiting for cluster to complete recovery: red / green"
time="2019-08-29T14:01:54Z" level=info msg="Waiting for cluster to be fully recovered before restarting elasticsearch-cdm-k8anfnxv-2: red / green"
time="2019-08-29T14:04:02Z" level=info msg="Waiting for cluster to complete recovery: / green"
time="2019-08-29T14:04:48Z" level=info msg="Waiting for cluster to complete recovery: / green"
time="2019-08-29T14:05:35Z" level=info msg="Waiting for cluster to complete recovery: / green"
time="2019-08-29T14:10:44Z" level=info msg="Waiting for cluster to complete recovery: / green"
time="2019-08-29T14:10:55Z" level=warning msg="Unknown type for low: <nil>"
time="2019-08-29T14:10:55Z" level=warning msg="Unknown type for high: <nil>"
time="2019-08-29T14:11:20Z" level=info msg="Waiting for cluster to complete recovery: red / green"
time="2019-08-29T14:11:39Z" level=info msg="Waiting for cluster to complete recovery: red / green"
time="2019-08-29T14:12:07Z" level=info msg="Waiting for cluster to complete recovery: red / green"
CLO logs:
```
time="2019-08-29T13:56:32Z" level=info msg="Watching logging.openshift.io/v1, ClusterLogging, openshift-logging, 5000000000"
time="2019-08-29T13:56:38Z" level=info msg="Elasticsearch image change found, updating elasticsearch"
time="2019-08-29T13:56:40Z" level=error msg="Unable to read file to get contents: open /tmp/_working_dir/kibana-proxy-oauth.secret: no such file or directory"
time="2019-08-29T13:56:42Z" level=info msg="Kibana image(s) change found, updating \"kibana\""
time="2019-08-29T13:56:43Z" level=info msg="Updating status of Kibana for \"instance\""
time="2019-08-29T13:56:45Z" level=info msg="Curator image change found, updating \"curator\""
time="2019-08-29T13:56:50Z" level=info msg="Collector image change found, updating \"rsyslog\""
time="2019-08-29T13:56:51Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T13:57:04Z" level=info msg="Updating status of Kibana for \"instance\""
time="2019-08-29T13:57:11Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T13:57:30Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T13:57:43Z" level=info msg="Updating status of Kibana for \"instance\""
time="2019-08-29T13:57:50Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T13:58:03Z" level=info msg="Updating status of Kibana for \"instance\""
time="2019-08-29T13:58:09Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T13:58:17Z" level=info msg="Updating status of Elasticsearch"
time="2019-08-29T13:59:07Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:00:24Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:00:32Z" level=info msg="Updating status of Elasticsearch"
time="2019-08-29T14:00:44Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:01:03Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:01:23Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:01:42Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:04:05Z" level=info msg="Updating status of Elasticsearch"
time="2019-08-29T14:04:43Z" level=info msg="Updating status of Elasticsearch"
time="2019-08-29T14:08:27Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:08:35Z" level=info msg="Updating status of Elasticsearch"
time="2019-08-29T14:08:47Z" level=info msg="Updating status of Rsyslog"
time="2019-08-29T14:10:31Z" level=info msg="Updating status of Elasticsearch"
```
I deleted the short hash pod, it got recreated, but if I deleted any RS or any deployments of ES none of them were recreated. Since I did nothing, but deploy logging and delete the short hash pod and some RS/deployments, I thought I could reproduce it with setting the log level of CLO to debug.
I redeployed the logging, changed the logging level. I again had 3 ES pods, one with short hash name. I deleted a pod with long hash name, and it got recreated. No interesting entries in CLO logs, just the starting 5 lines that are printed always (version and such). *SIGH* now that I think about it I never deleted any RS or deployment. So there goes my debugging, and this makes all the information fairly useless, sorry.
Few other things I noticed. The ES pod is in `running 2/2` and it is immediately terminated without any error. Also, sometimes after starting the cluster, all the pods are terminated one by one and replaced. In this case there is no ES pod with short hash name, and if I delete one it is properly replaced by a new pod.
```
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-84fbcff567-ghbkr 1/1 Running 0 4m24s
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 1/2 Running 0 5m44s
elasticsearch-cdm-uirubjv6-2-cdfddc54f-svv4p 1/2 Running 0 4m42s
elasticsearch-cdm-uirubjv6-3-68b5c7857b-5l4pc 1/2 Running 0 2m7s
kibana-59d58d4b49-6w8f4 2/2 Running 0 4m11s
rsyslog-bwjvf 1/1 Running 0 3m11s
rsyslog-c862w 1/1 Running 0 3m48s
rsyslog-gc4nh 1/1 Running 0 3m27s
rsyslog-s5vlz 1/1 Running 0 3m16s
rsyslog-sn9g6 1/1 Running 0 3m31s
rsyslog-sx6xj 1/1 Running 0 3m38s
elasticsearch-cdm-uirubjv6-2-cdfddc54f-svv4p 2/2 Running 0 4m55s
elasticsearch-cdm-uirubjv6-3-68b5c7857b-5l4pc 2/2 Running 0 2m20s
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 2/2 Running 0 6m
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 2/2 Terminating 0 6m17s
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m18s
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m18s
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m26s
elasticsearch-cdm-uirubjv6-1-59ddb497c5-jwv9l 0/2 Terminating 0 6m26s
elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 0/2 Pending 0 0s
elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 0/2 Pending 0 0s
elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 0/2 ContainerCreating 0 0s
elasticsearch-cdm-uirubjv6-1-68948bb74d-whx4q 1/2 Running 0 27s
rsyslog-s5vlz 0/1 OOMKilled 0 4m27s
rsyslog-s5vlz 1/1 Running 1 4m28s
```
|