1824006 – ES pod stuck in CrashLoopBackOff status after upgraded from 4.3 to 4.4 -- Could not resolve placeholder 'POD_IP'

Bug 1824006 - ES pod stuck in CrashLoopBackOff status after upgraded from 4.3 to 4.4 -- Could not resolve placeholder 'POD_IP'

Summary: ES pod stuck in CrashLoopBackOff status after upgraded from 4.3 to 4.4 -- Cou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.4.z
Assignee:	Jeff Cantrill
QA Contact:	Qiaoling Tang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1825056 (view as bug list)
Depends On:	1826781
Blocks:	1825056
TreeView+	depends on / blocked

Reported:	2020-04-15 03:18 UTC by Qiaoling Tang
Modified:	2023-12-15 17:41 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1825056 (view as bug list)
Environment:
Last Closed:	2020-10-05 14:55:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 336	None	closed	Bug 1824006: Backported fix for ES ENV	2021-01-25 11:57:45 UTC
Github	openshift elasticsearch-operator pull 352	None	closed	Bug 1824006: fix update of env vars	2021-01-25 11:57:45 UTC
Red Hat Product Errata	RHBA-2020:2133	None	None	None	2020-05-18 13:35:23 UTC

Description Qiaoling Tang 2020-04-15 03:18:34 UTC

Description of problem:
Upgrade clusterlogging from 4.3 to 4.4, the ES pod stuck in CrashLoopBackOff status.

$ oc get pod -owide
NAME                                            READY   STATUS             RESTARTS   AGE     IP            NODE                                        NOMINATED NODE   READINESS GATES
cluster-logging-operator-5d44b567cf-m57pf       1/1     Running            0          8m11s   10.131.0.31   ip-10-0-60-119.us-east-2.compute.internal   <none>           <none>
elasticsearch-cdm-o0ylrfxl-1-6d79ffc687-f2fn7   1/2     CrashLoopBackOff   6          7m59s   10.131.0.32   ip-10-0-60-119.us-east-2.compute.internal   <none>           <none>
fluentd-5z9tr                                   1/1     Running            0          6m1s    10.130.0.36   ip-10-0-60-248.us-east-2.compute.internal   <none>           <none>
fluentd-8pk7h                                   1/1     Running            0          7m29s   10.131.0.33   ip-10-0-60-119.us-east-2.compute.internal   <none>           <none>
fluentd-f4sg8                                   1/1     Running            0          6m59s   10.128.2.34   ip-10-0-78-116.us-east-2.compute.internal   <none>           <none>
fluentd-r5vmp                                   1/1     Running            0          6m39s   10.129.0.46   ip-10-0-52-176.us-east-2.compute.internal   <none>           <none>
fluentd-rr65n                                   1/1     Running            0          6m18s   10.128.0.16   ip-10-0-68-97.us-east-2.compute.internal    <none>           <none>
fluentd-tgnlj                                   1/1     Running            0          5m37s   10.129.2.18   ip-10-0-54-24.us-east-2.compute.internal    <none>           <none>
kibana-666b9ff487-blhgf                         2/2     Running            0          7m50s   10.128.2.33   ip-10-0-78-116.us-east-2.compute.internal   <none>           <none>

$ oc logs -c proxy elasticsearch-cdm-o0ylrfxl-1-6d79ffc687-f2fn7
2020/04/15 02:45:00 provider.go:118: Defaulting client-id to system:serviceaccount:openshift-logging:elasticsearch
2020/04/15 02:45:00 provider.go:123: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
2020/04/15 02:45:00 provider.go:311: Delegation of authentication and authorization to OpenShift is enabled for bearer tokens and client certificates.
2020/04/15 02:45:00 oauthproxy.go:200: mapping path "/" => upstream "https://localhost:9200/"
2020/04/15 02:45:00 oauthproxy.go:227: OAuthProxy configured for  Client ID: system:serviceaccount:openshift-logging:elasticsearch
2020/04/15 02:45:00 oauthproxy.go:237: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2020/04/15 02:45:00 oauthproxy.go:249: WARN: Configured to pass client specified bearer token upstream.
      Only use this option if you're sure that the upstream will not leak or abuse the given token.
      Bear in mind that the token could be a long lived token or hard to revoke.
2020/04/15 02:45:00 http.go:61: HTTP: listening on :4180
2020/04/15 02:45:00 http.go:107: HTTPS: listening on [::]:60000
I0415 02:45:00.640136       1 dynamic_serving_content.go:129] Starting serving::/etc/proxy/secrets/tls.crt::/etc/proxy/secrets/tls.key
2020/04/15 02:45:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:45:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:45:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:45:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:46:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:46:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:46:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:46:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:47:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:47:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:47:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:47:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:48:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:48:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:48:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:48:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:49:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:49:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:49:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:49:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:50:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:50:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:50:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:50:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:51:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:51:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:51:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:51:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:52:19 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:52:26 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:52:49 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused
2020/04/15 02:52:56 reverseproxy.go:437: http: proxy error: dial tcp [::1]:9200: connect: connection refused

$ oc logs -c elasticsearch elasticsearch-cdm-o0ylrfxl-1-6d79ffc687-f2fn7
[2020-04-15 02:50:58,412][INFO ][container.run            ] Begin Elasticsearch startup script
[2020-04-15 02:50:58,415][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2020-04-15 02:50:58,416][INFO ][container.run            ] Inspecting the maximum RAM available...
[2020-04-15 02:50:58,418][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m'
[2020-04-15 02:50:58,419][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret
[2020-04-15 02:50:58,423][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2020-04-15 02:51:00,248][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2020-04-15 02:51:00,249][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled'
[2020-04-15 02:51:00,249][INFO ][container.run            ] Checking if Elasticsearch is ready
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Exception in thread "main" java.lang.IllegalArgumentException: Could not resolve placeholder 'POD_IP'
        at org.elasticsearch.common.settings.PropertyPlaceholder.parseStringValue(PropertyPlaceholder.java:116)
        at org.elasticsearch.common.settings.PropertyPlaceholder.replacePlaceholders(PropertyPlaceholder.java:69)
        at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1109)
        at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1069)
        at org.elasticsearch.node.InternalSettingsPreparer.initializeSettings(InternalSettingsPreparer.java:137)
        at org.elasticsearch.node.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:115)
        at org.elasticsearch.cli.EnvironmentAwareCommand.createEnv(EnvironmentAwareCommand.java:75)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
        at org.elasticsearch.cli.Command.main(Command.java:90)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)


$ oc logs elasticsearch-operator-6f4994bc74-7rss9 
time="2020-04-15T02:44:25Z" level=warning msg="Unable to parse loglevel \"\""
{"level":"info","ts":1586918665.8147256,"logger":"cmd","msg":"Go Version: go1.13.4"}
{"level":"info","ts":1586918665.81475,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1586918665.8147547,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"}
{"level":"info","ts":1586918665.8150382,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1586918665.9269526,"logger":"leader","msg":"Found existing lock","LockOwner":"elasticsearch-operator-5d85b86dbd-hwvgr"}
{"level":"info","ts":1586918665.9327946,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1586918667.072788,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1586918669.4545178,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1586918673.993244,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1586918682.6981413,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1586918682.7784624,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1586918682.7786903,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1586918682.884639,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1586918682.8846705,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1586918682.9849257,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1586918683.0851054,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
time="2020-04-15T02:44:46Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 1 shards in preparation for cluster restart"
time="2020-04-15T02:45:57Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T02:45:57Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-o0ylrfxl-1: Node elasticsearch-cdm-o0ylrfxl-1 has not rejoined cluster elasticsearch yet"
time="2020-04-15T02:48:59Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T02:52:01Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T02:55:03Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T02:58:05Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T03:01:07Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T03:04:10Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T03:07:12Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"
time="2020-04-15T03:10:14Z" level=info msg="Timed out waiting for elasticsearch-cdm-o0ylrfxl-1 to rejoin cluster"

$ oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                     PHASE
clusterlogging.4.4.0-202004091036           Cluster Logging          4.4.0-202004091036   clusterlogging.4.3.13-202004131016           Succeeded
elasticsearch-operator.4.4.0-202004141917   Elasticsearch Operator   4.4.0-202004141917   elasticsearch-operator.4.3.13-202004131016   Succeeded


Version-Release number of selected component (if applicable):
ose-elasticsearch-operator-v4.4.0-202004141917


How reproducible:
Always

Steps to Reproduce:
1. deploy logging 4.3
2. upgrade it to 4.4
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Cantrill 2020-04-15 15:25:38 UTC

Please attach:

1. Elasticsearch CR
2. Elasticsearch deployment yaml
3. Elasticsearch pod yaml
4. Elasticsearch configmap
5. Elasticsearch operator logs

This ENV VAR should exist for this release [1][2]

[1] https://github.com/openshift/elasticsearch-operator/blob/release-4.4/pkg/k8shandler/common.go#L292
[2] https://github.com/openshift/elasticsearch-operator/blob/release-4.4/pkg/k8shandler/configuration_tmpl.go#L18

Comment 3 Qiaoling Tang 2020-04-16 01:21:12 UTC

If deploy logging 4.4, the `POD_IP` env var could be found in the ES pod, if upgrade from 4.3 to 4.4, there doesn't have this env var.

Comment 4 Jeff Cantrill 2020-04-17 14:19:26 UTC

The issue is the operator is not comparing the ENV VARs for the containers and thus is not rolling out a new deployment

Comment 5 Jeff Cantrill 2020-04-17 16:05:39 UTC

*** Bug 1825056 has been marked as a duplicate of this bug. ***

Comment 7 Anping Li 2020-04-24 12:55:12 UTC

If the Elasticsearch clusters had been upgraded to 4.4, the ES pods looks like below.  you can fix it by deleting the elasticsearch deployments in sequence.

NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-cd9746755-w94h5        1/1     Running            0          54m
curator-1587731400-rnmwl                        0/1     Completed          0          5m8s
elasticsearch-cdm-6o5ww5ha-1-7b77d8bb5-9dh9t    1/2     CrashLoopBackOff   15         54m
elasticsearch-cdm-6o5ww5ha-2-87bf69d87-4wsbf    2/2     Running            0          59m
elasticsearch-cdm-6o5ww5ha-3-746d6b8f66-g57mp   2/2     Running            0          58m
fluentd-2ccqc                                   1/1     Running            0          53m
fluentd-4h2qt                                   1/1     Running            0          52m
fluentd-9jq2l                                   1/1     Running            1          54m
fluentd-g4hzk                                   1/1     Running            0          54m
fluentd-g9cs6                                   1/1     Running            0          53m
fluentd-l4j8j                                   1/1     Running            0          53m
kibana-5cd48f8bcd-xjlj5                         2/2     Running            0          54m

The workaound:
1) Delete the first CrashLoopBackOff deployment. 
$oc delete deployment elasticsearch-cdm-6o5ww5ha-1
2) Waiting until the other ES pod CrashLoopBackOff, then delete its deployment. 
$oc delete deployment elasticsearch-cdm-6o5ww5ha-2

Comment 8 Michael Burke 2020-04-24 19:54:11 UTC

Is the plan to fix this issue for GA? 

If not, should we add this information to the official update docs? I added a Known Issue to the release notes.

Comment 9 Ben Parees 2020-04-29 02:04:12 UTC

*** Bug 1829064 has been marked as a duplicate of this bug. ***

Comment 10 Ben Parees 2020-04-30 16:44:04 UTC

setting to urgent, this needs to go out in the first 4.4.z.

Comment 12 IgorKarpukhin 2020-05-04 13:24:54 UTC

This PR https://github.com/openshift/elasticsearch-operator/pull/336 resolves this bug

Comment 15 Qiaoling Tang 2020-05-06 01:43:36 UTC

Upgrade failed again:

There didn't have any logs indicate the env var changed in the ES deployment, the POD_IP env var wasn't added to the ES deployment.

$ oc logs -n openshift-operators-redhat elasticsearch-operator-75c868554d-kwjns 
time="2020-05-06T01:37:36Z" level=warning msg="Unable to parse loglevel \"\""
{"level":"info","ts":1588729056.00425,"logger":"cmd","msg":"Go Version: go1.13.4"}
{"level":"info","ts":1588729056.0042713,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1588729056.0042765,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"}
{"level":"info","ts":1588729056.0045552,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1588729056.1929462,"logger":"leader","msg":"Found existing lock","LockOwner":"elasticsearch-operator-6bfb774467-fms2f"}
{"level":"info","ts":1588729056.200799,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1588729057.3292577,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1588729059.7134275,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1588729064.2511082,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1588729072.9628963,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1588729073.122422,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1588729073.1226575,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1588729073.2924252,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"info","ts":1588729073.2924507,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1588729073.3927507,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1588729073.4928908,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
time="2020-05-06T01:37:56Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2020-05-06T01:39:01Z" level=info msg="Timed out waiting for elasticsearch-cdm-ldnvhe0o-1 to rejoin cluster"
time="2020-05-06T01:39:01Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-ldnvhe0o-1: Node elasticsearch-cdm-ldnvhe0o-1 has not rejoined cluster elasticsearch yet"
time="2020-05-06T01:41:34Z" level=info msg="Timed out waiting for elasticsearch-cdm-ldnvhe0o-1 to rejoin cluster"

$ oc get pod
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-5d7cf5b97d-s2cts       1/1     Running            0          2m36s
elasticsearch-cdm-ldnvhe0o-1-56f689488f-c2xqn   1/2     CrashLoopBackOff   4          2m29s
fluentd-5cwc9                                   1/1     Running            0          47s
fluentd-6zx2d                                   1/1     Running            0          97s
fluentd-cqln8                                   1/1     Running            0          72s
fluentd-l45qt                                   1/1     Running            0          3s
fluentd-n6fdn                                   1/1     Running            0          2m5s
fluentd-rkbrn                                   1/1     Running            0          26s
kibana-578d7ddd66-n87jx                         2/2     Running            0          2m21s

$ oc logs -c elasticsearch elasticsearch-cdm-ldnvhe0o-1-56f689488f-c2xqn 
[2020-05-06 01:39:41,586][INFO ][container.run            ] Begin Elasticsearch startup script
[2020-05-06 01:39:41,599][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2020-05-06 01:39:41,601][INFO ][container.run            ] Inspecting the maximum RAM available...
[2020-05-06 01:39:41,603][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m'
[2020-05-06 01:39:41,605][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret
[2020-05-06 01:39:41,609][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2020-05-06 01:39:43,492][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2020-05-06 01:39:43,493][INFO ][container.run            ] [2020-05-06 01:39:43,493][INFO ][container.run            ] Checking if Elasticsearch is ready
ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled'
Exception in thread "main" java.lang.IllegalArgumentException: Could not resolve placeholder 'POD_IP'
	at org.elasticsearch.common.settings.PropertyPlaceholder.parseStringValue(PropertyPlaceholder.java:116)
	at org.elasticsearch.common.settings.PropertyPlaceholder.replacePlaceholders(PropertyPlaceholder.java:69)
	at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1109)
	at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1069)
	at org.elasticsearch.node.InternalSettingsPreparer.initializeSettings(InternalSettingsPreparer.java:137)
	at org.elasticsearch.node.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:115)
	at org.elasticsearch.cli.EnvironmentAwareCommand.createEnv(EnvironmentAwareCommand.java:75)
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
	at org.elasticsearch.cli.Command.main(Command.java:90)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)

$ oc get deploy elasticsearch-cdm-ldnvhe0o-1 -oyaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: "2020-05-06T01:36:15Z"
  generation: 5
  labels:
    cluster-name: elasticsearch
    component: elasticsearch
    es-node-client: "true"
    es-node-data: "true"
    es-node-master: "true"
    node-name: elasticsearch-cdm-ldnvhe0o-1
    tuned.openshift.io/elasticsearch: "true"
  name: elasticsearch-cdm-ldnvhe0o-1
  namespace: openshift-logging
  ownerReferences:
  - apiVersion: logging.openshift.io/v1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 9f6de074-dd12-4422-aea2-98125214d1b8
  resourceVersion: "61471"
  selfLink: /apis/apps/v1/namespaces/openshift-logging/deployments/elasticsearch-cdm-ldnvhe0o-1
  uid: 0bc91513-6cbf-4a56-ad1a-35c30f170183
spec:
  paused: true
  progressDeadlineSeconds: 1800
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      cluster-name: elasticsearch
      es-node-client: "true"
      es-node-data: "true"
      es-node-master: "true"
      node-name: elasticsearch-cdm-ldnvhe0o-1
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        cluster-name: elasticsearch
        component: elasticsearch
        es-node-client: "true"
        es-node-data: "true"
        es-node-master: "true"
        node-name: elasticsearch-cdm-ldnvhe0o-1
        tuned.openshift.io/elasticsearch: "true"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: es-node-client
                  operator: In
                  values:
                  - "true"
                - key: es-node-data
                  operator: In
                  values:
                  - "true"
                - key: es-node-master
                  operator: In
                  values:
                  - "true"
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - env:
        - name: DC_NAME
          value: elasticsearch-cdm-ldnvhe0o-1
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: KUBERNETES_TRUST_CERT
          value: "true"
        - name: SERVICE_DNS
          value: elasticsearch-cluster
        - name: CLUSTER_NAME
          value: elasticsearch
        - name: INSTANCE_RAM
          value: 4Gi
        - name: HEAP_DUMP_LOCATION
          value: /elasticsearch/persistent/heapdump.hprof
        - name: RECOVER_AFTER_TIME
          value: 5m
        - name: READINESS_PROBE_TIMEOUT
          value: "30"
        - name: POD_LABEL
          value: cluster=elasticsearch
        - name: IS_MASTER
          value: "true"
        - name: HAS_DATA
          value: "true"
        image: quay.io/openshift-qe-optional-operators/ose-logging-elasticsearch5@sha256:deeb2c8d91dd8e0f3050db5be111230515b5d06184ed99ba69441e3ae30fb866
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
        - containerPort: 9300
          name: cluster
          protocol: TCP
        - containerPort: 9200
          name: restapi
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /usr/share/elasticsearch/probe/readiness.sh
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 30
        resources:
          limits:
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 4Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /elasticsearch/persistent
          name: elasticsearch-storage
        - mountPath: /usr/share/java/elasticsearch/config
          name: elasticsearch-config
        - mountPath: /etc/openshift/elasticsearch/secret
          name: certificates
      - args:
        - --http-address=:4180
        - --https-address=:60000
        - --provider=openshift
        - --upstream=https://localhost:9200
        - --tls-cert=/etc/proxy/secrets/tls.crt
        - --tls-key=/etc/proxy/secrets/tls.key
        - --upstream-ca=/etc/proxy/elasticsearch/admin-ca
        - --openshift-service-account=elasticsearch
        - '-openshift-sar={"resource": "namespaces", "verb": "get"}'
        - '-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}'
        - --pass-user-bearer-token
        - --cookie-secret=ZHu6FwjOM/5TP62AiiwZDQ==
        image: quay.io/openshift-qe-optional-operators/ose-oauth-proxy@sha256:5fe478210770b21c1eb26c1570bcbda40bc5a79011580ff5ebd4c701a5b04eb2
        imagePullPolicy: IfNotPresent
        name: proxy
        ports:
        - containerPort: 60000
          name: metrics
          protocol: TCP
        resources:
          limits:
            memory: 64Mi
          requests:
            cpu: 100m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/proxy/secrets
          name: elasticsearch-metrics
        - mountPath: /etc/proxy/elasticsearch
          name: certificates
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: elasticsearch
      serviceAccountName: elasticsearch
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node.kubernetes.io/disk-pressure
        operator: Exists
      volumes:
      - configMap:
          defaultMode: 420
          name: elasticsearch
        name: elasticsearch-config
      - name: elasticsearch-storage
        persistentVolumeClaim:
          claimName: elasticsearch-elasticsearch-cdm-ldnvhe0o-1
      - name: certificates
        secret:
          defaultMode: 420
          secretName: elasticsearch
      - name: elasticsearch-metrics
        secret:
          defaultMode: 420
          secretName: elasticsearch-metrics
status:
  conditions:
  - lastTransitionTime: "2020-05-06T01:37:56Z"
    lastUpdateTime: "2020-05-06T01:37:56Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-05-06T01:38:00Z"
    lastUpdateTime: "2020-05-06T01:38:00Z"
    message: Deployment is paused
    reason: DeploymentPaused
    status: Unknown
    type: Progressing
  observedGeneration: 5
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1


$ oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                     PHASE
clusterlogging.4.4.0-202005041337           Cluster Logging          4.4.0-202005041337   clusterlogging.4.3.19-202005041055           Succeeded
elasticsearch-operator.4.4.0-202005051217   Elasticsearch Operator   4.4.0-202005051217   elasticsearch-operator.4.3.19-202005041055   Succeeded


EO image: openshift-ose-elasticsearch-operator-v4.4.0-202005051217

Comment 16 XiuJuan Wang 2020-05-15 01:32:29 UTC

The fix is merged, and qe passed test with new elasticsearch-operator images.

Comment 17 Qiaoling Tang 2020-05-15 01:36:43 UTC

Tested with elasticsearch-operator.4.4.0-202005131452, the upgrade succeeded.

$ oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                     PHASE
clusterlogging.4.4.0-202005121717           Cluster Logging          4.4.0-202005121717   clusterlogging.4.3.19-202005041055           Succeeded
elasticsearch-operator.4.4.0-202005131452   Elasticsearch Operator   4.4.0-202005131452   elasticsearch-operator.4.3.19-202005041055   Succeeded

$ oc get pod
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-d6468c6f7-jq4mc        1/1     Running     0          18m
curator-1589505600-j2zc6                        0/1     Completed   0          62s
elasticsearch-cdm-enke4rg3-1-774f7c8f4b-vdsnx   2/2     Running     0          16m
elasticsearch-cdm-enke4rg3-2-66c6685987-hmdfr   2/2     Running     0          14m
elasticsearch-cdm-enke4rg3-3-6777f76b6b-rbrl7   2/2     Running     0          12m
fluentd-4pdrn                                   1/1     Running     0          17m
fluentd-lxqg2                                   1/1     Running     0          16m
fluentd-qmxsf                                   1/1     Running     0          17m
fluentd-rrs7z                                   1/1     Running     0          16m
fluentd-sp4cf                                   1/1     Running     0          17m
fluentd-wxpnb                                   1/1     Running     0          17m
fluentd-xdr92                                   1/1     Running     0          16m
kibana-849c98d867-kgrzc                         2/2     Running     0          18m

Comment 19 errata-xmlrpc 2020-05-18 13:35:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2133

Comment 20 Anping Li 2020-06-05 07:57:10 UTC

Hit this issue again. when upgrade from clusterlogging.4.3.24-202006012115 to clusterlogging.4.4.0-202006011837

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                     PHASE
clusterlogging.4.4.0-202006011837           Cluster Logging          4.4.0-202006011837   clusterlogging.4.3.24-202006012115           Succeeded
elasticsearch-operator.4.4.0-202006011837   Elasticsearch Operator   4.4.0-202006011837   elasticsearch-operator.4.3.24-202006011815   Succeeded


Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2020-06-05 07:53:54,611][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2020-06-05 07:53:54,613][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled'
[2020-06-05 07:53:54,613][INFO ][container.run            ] Checking if Elasticsearch is ready
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Exception in thread "main" java.lang.IllegalArgumentException: Could not resolve placeholder 'POD_IP'
	at org.elasticsearch.common.settings.PropertyPlaceholder.parseStringValue(PropertyPlaceholder.java:116)
	at org.elasticsearch.common.settings.PropertyPlaceholder.replacePlaceholders(PropertyPlaceholder.java:69)
	at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1109)
	at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1069)
	at org.elasticsearch.node.InternalSettingsPreparer.initializeSettings(InternalSettingsPreparer.java:137)
	at org.elasticsearch.node.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:115)
	at org.elasticsearch.cli.EnvironmentAwareCommand.createEnv(EnvironmentAwareCommand.java:75)
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:70)
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:134)
	at org.elasticsearch.cli.Command.main(Command.java:90)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)

Comment 21 Jeff Cantrill 2020-06-07 23:42:13 UTC

What is different from [1] during testing that we have regressed?  Please provide:

* the starting operator image version
* the starting elasticsearch configmap
* the ending elasticsearch configmap
* the starting elasticsearch yaml
* the ending elasticsearch yaml

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1824006#c17

Comment 22 Anping Li 2020-06-08 02:46:55 UTC

Couldn't reproduce now. 

#oc get csv
NAME                                        DISPLAY                  VERSION              REPLACES                                     PHASE
clusterlogging.4.4.0-202006061254           Cluster Logging          4.4.0-202006061254   clusterlogging.4.3.25-202006060952           Succeeded
elasticsearch-operator.4.4.0-202006061254   Elasticsearch Operator   4.4.0-202006061254   elasticsearch-operator.4.3.25-202006060952   Succeeded

Comment 36 Luke Meyer 2020-10-05 19:50:58 UTC

Please:
> If the solution does not work for you, OPEN A NEW BUG REPORT.

By design we cannot re-ship a bug that has already been shipped.

Note You need to log in before you can comment on or make changes to this bug.