Bug 1847365

Summary: Elasticsearch plugin is not accessible from Kibana dashboard on OCP 4.3 on POWER platform
Product: OpenShift Container Platform Reporter: Archana Prabhakar <aprabhak>
Component: Multi-ArchAssignee: Dennis Gilmore <dgilmore>
Status: CLOSED CURRENTRELEASE QA Contact: Barry Donahue <bdonahue>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: cbaus, clnperez, danili, mkumatag, pdsilva, yselkowi
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-11 21:34:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1855072    
Bug Blocks:    
Attachments:
Description Flags
Elasticsearch plugin is red on kibana dashboard none

Description Archana Prabhakar 2020-06-16 09:43:37 UTC
Created attachment 1697587 [details]
Elasticsearch plugin is red on kibana dashboard

Created attachment 1697587 [details]
Elasticsearch plugin is red on kibana dashboard

Description of problem:

Followed instructions at https://docs.openshift.com/container-platform/4.3/logging/cluster-logging-deploying.html

to install additional operators logging and elasticsearch on OCP 4.3.18

Kibana dashborad opens but elasticsearch plugin is red as shown in the attached file.


Followed instructions at https://docs.openshift.com/container-platform/4.3/logging/config/cluster-logging-elasticsearch.html to create and expose elasticsearch route.

```
[root@arc-es-ec43-bastion elk]# oc get pods -n openshift-logging
NAME                                            READY   STATUS    RESTARTS   AGE
cluster-logging-operator-7fdc89799c-rk4b7       1/1     Running   0          4m17s
elasticsearch-cdm-ak8cqf3b-1-7b8746c755-hlxjm   1/2     Running   0          55s
elasticsearch-cdm-ak8cqf3b-2-cdbb8ff65-pr2c2    1/2     Running   0          21s
elasticsearch-cdm-ak8cqf3b-3-558b97b6c-242pz    1/2     Running   0          18s
fluentd-5dlbd                                   1/1     Running   0          46s
fluentd-f9qlw                                   1/1     Running   0          46s
fluentd-g2bd9                                   1/1     Running   0          45s
fluentd-vl8gl                                   1/1     Running   0          47s
fluentd-wdmvw                                   1/1     Running   0          51s
kibana-77d496d75d-8lcdr                         2/2     Running   0          54s
```

```
[root@arc-es-ec43-bastion elk]# oc get service elasticsearch
NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.30.108.6   <none>        9200/TCP   97s

oc get route elasticsearch -o jsonpath={.spec.host}
elasticsearch-openshift-logging.apps.arc-es-ec43.redhat.com

[root@arc-es-ec43-bastion elk]# curl -tlsv1.2 --insecure "https://elasticsearch-openshift-logging.apps.arc-es-ec43.redhat.com/.operations.*/_search?size=1" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3131    0  3131    0     0   203k      0 --:--:-- --:--:-- --:--:--  203k
parse error: Invalid numeric literal at line 2, column 0

```

```
[root@arc-es-ec43-bastion elk]# oc get routes --all-namespaces
NAMESPACE                  NAME                HOST/PORT                                                            PATH   SERVICES            PORT    TERMINATION            WILDCARD
openshift-authentication   oauth-openshift     oauth-openshift.apps.arc-es-ec43.redhat.com                                 oauth-openshift     6443    passthrough/Redirect   None
openshift-console          console             console-openshift-console.apps.arc-es-ec43.redhat.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.arc-es-ec43.redhat.com                     downloads           http    edge/Redirect          None
openshift-logging          elasticsearch       elasticsearch-openshift-logging.apps.arc-es-ec43.redhat.com                 elasticsearch       <all>   reencrypt              None
openshift-logging          kibana              kibana-openshift-logging.apps.arc-es-ec43.redhat.com                        kibana              <all>   reencrypt/Redirect     None
openshift-monitoring       alertmanager-main   alertmanager-main-openshift-monitoring.apps.arc-es-ec43.redhat.com          alertmanager-main   web     reencrypt/Redirect     None
openshift-monitoring       grafana             grafana-openshift-monitoring.apps.arc-es-ec43.redhat.com                    grafana             https   reencrypt/Redirect     None
openshift-monitoring       prometheus-k8s      prometheus-k8s-openshift-monitoring.apps.arc-es-ec43.redhat.com             prometheus-k8s      web     reencrypt/Redirect     None
openshift-monitoring       thanos-querier      thanos-querier-openshift-monitoring.apps.arc-es-ec43.redhat.com             thanos-querier      web     reencrypt/Redirect     None
```

elasticsearch service is created but I don't see any endpoints(pod ip addresses) reference for the same!

```
[root@arc-es-ec43-bastion ~]# oc describe svc elasticsearch -n openshift-logging
Name:              elasticsearch
Namespace:         openshift-logging
Labels:            cluster-name=elasticsearch
Annotations:       <none>
Selector:          cluster-name=elasticsearch,es-node-client=true
Type:              ClusterIP
IP:                172.30.108.6
Port:              elasticsearch  9200/TCP
TargetPort:        restapi/TCP
Endpoints:
Session Affinity:  None
Events:            <none>
[root@arc-es-ec43-bastion ~]#

[root@arc-es-ec43-bastion ~]# oc get ep
NAME                    ENDPOINTS                                                           AGE
elasticsearch                                                                               14d
elasticsearch-cluster   10.130.0.5:9300,10.130.0.6:9300,10.131.0.18:9300                    14d
elasticsearch-metrics                                                                       14d
fluentd                 10.128.0.77:24231,10.128.2.72:24231,10.129.0.74:24231 + 2 more...   14d
kibana                  10.131.1.0:3000                                                     14d
[root@arc-es-ec43-bastion ~]#
```

though get pods command is showing proper IP addresses!
```
[root@arc-es-ec43-bastion ~]# oc get pods -n openshift-logging -l cluster-name=elasticsearch,es-node-client=true -o=wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP            NODE                              NOMINATED NODE   READINESS GATES
elasticsearch-cdm-ak8cqf3b-1-7b8746c755-6cqkc   1/2     Running   1          8d    10.130.0.5    worker-0.arc-es-ec43.redhat.com   <none>           <none>
elasticsearch-cdm-ak8cqf3b-2-cdbb8ff65-xw6ks    1/2     Running   3          8d    10.130.0.6    worker-0.arc-es-ec43.redhat.com   <none>           <none>
elasticsearch-cdm-ak8cqf3b-3-558b97b6c-fqwt9    1/2     Running   5          8d    10.131.0.18   worker-1.arc-es-ec43.redhat.com   <none>           <none>
```

Version-Release number of selected component (if applicable):
4.3.18

How reproducible:
On all 4.3.18 GA'd builds of OCP on POWER

Steps to Reproduce:
1. Install OCP 4.3.18 on powervm server
2. Follow documentation steps at to enable logging and elasticsearch operators.
https://docs.openshift.com/container-platform/4.3/logging/cluster-logging-deploying.html
 
3. Kibana dashboard opens but the ELK plugin is red as shown in the attached file.

Actual results:

After enabling the elasticsearch and logging operator on an OCP 4.3 cluster, the Kibana dashboard is accessible but the elasticsearch plugin shows up in a bad state.

The elk route has been created and added to the local systems /etc/hosts entry.

Expected results:


Additional info:

Comment 1 Archana Prabhakar 2020-06-16 09:46:06 UTC
Address resolution is happening properly, ran the curl from kibana and see that is able to resolve the svc address to 172.30.108.6 but unfortunately couldn't connect because it is not able to forward the traffic down to the pod's port.

[root@arc-es-ec43-bastion ~]# oc debug pod/kibana-68d8f7694d-wrn6x -n openshift-logging
Defaulting container name to kibana.
Use 'oc describe pod/kibana-68d8f7694d-wrn6x-debug -n openshift-logging' to see all of the containers in this pod.

Starting pod/kibana-68d8f7694d-wrn6x-debug ...
Pod IP: 10.130.0.80
If you don't see a command prompt, try pressing enter.
sh-4.2$ curl -v https://elasticsearch.openshift-logging.svc.cluster.local:9300
^C
sh-4.2$ curl -v https://elasticsearch.openshift-logging.svc.cluster.local:9200
* About to connect() to elasticsearch.openshift-logging.svc.cluster.local port 9200 (#0)
*   Trying 172.30.108.6...

Comment 2 Archana Prabhakar 2020-06-16 09:46:37 UTC
I see the following messages in the ES pods:

[root@arc-es-ec43-bastion ~]# oc logs elasticsearch-cdm-ak8cqf3b-2-cdbb8ff65-xw6ks -c elasticsearch
[2020-05-31 11:12:46,140][INFO ][container.run            ] Begin Elasticsearch startup script
[2020-05-31 11:12:46,256][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2020-05-31 11:12:46,261][INFO ][container.run            ] Inspecting the maximum RAM available...
[2020-05-31 11:12:46,285][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms8192m -Xmx8192m'
[2020-05-31 11:12:46,287][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret
[2020-05-31 11:12:46,729][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2020-05-31 11:13:05,012][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2020-05-31 11:13:05,016][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms8192m -Xmx8192m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled'
[2020-05-31 11:13:05,090][INFO ][container.run            ] Checking if Elasticsearch is ready
[2020-05-31 11:18:10,916][ERROR][container.run            ] Timed out waiting for Elasticsearch to be ready
cat: elasticsearch_connect_log.txt: No such file or directory
[root@arc-es-ec43-bastion ~]#


readinessProbe is failing and ES is also not running properly!

[root@arc-es-ec43-bastion ~]# oc exec -ti elasticsearch-cdm-ak8cqf3b-1-7b8746c755-6cqkc -n openshift-logging sh
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-ak8cqf3b-1-7b8746c755-6cqkc -n openshift-logging' to see all of the containers in this pod.
sh-4.2$
sh-4.2$
sh-4.2$ es_util --query=/_cat/indices?v -v
* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connection refused
* Failed connect to localhost:9200; Connection refused
* Closing connection 0
sh-4.2$ 
sh-4.2$ /usr/share/elasticsearch/probe/readiness.sh
Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]
sh-4.2$

Comment 3 Carvel Baus 2020-06-25 13:29:51 UTC
I noticed that the namespace you used was not the one "recommended" by the documentation referenced. Do you think it's possible something was misconfigured in using the different namespace than what the instructions consistently used? (i.e. openshift-logging instead of openshift-operators-redhat).

Comment 4 Dan Li 2020-06-26 21:00:36 UTC
BZ #1807201 is WIP that looks to fix the functionality of the Elasticsearch, which could potentially block this bug. I am linking 1807201 to this bug.

Comment 5 pdsilva 2020-06-30 13:48:40 UTC
Encountered this issue on OCP 4.4.9 on Power:

# oc version
Client Version: 4.4.9
Server Version: 4.4.9
Kubernetes Version: v1.17.1+912792b

# oc get subscription
NAME              PACKAGE           SOURCE             CHANNEL
cluster-logging   cluster-logging   redhat-operators   4.4

# oc get csv
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.4.0-202006211643.p0           Cluster Logging          4.4.0-202006211643.p0              Succeeded
elasticsearch-operator.4.4.0-202006211643.p0   Elasticsearch Operator   4.4.0-202006211643.p0              Succeeded

# oc get pods -n openshift-logging
NAME                                            READY   STATUS    RESTARTS   AGE
cluster-logging-operator-74c9cf49bc-vk4cj       1/1     Running   0          29m
elasticsearch-cdm-00ygk06t-1-64d6b99f4b-dgb6v   1/2     Running   0          28m
elasticsearch-cdm-00ygk06t-2-56cf7dffc-dhgp9    1/2     Running   0          28m
elasticsearch-cdm-00ygk06t-3-664f84cdd6-lntdz   1/2     Running   0          28m
fluentd-6t6dw                                   1/1     Running   0          29m
fluentd-8vdkm                                   1/1     Running   0          29m
fluentd-jbbdd                                   1/1     Running   0          29m
fluentd-lrf6d                                   1/1     Running   0          29m
fluentd-n87zr                                   1/1     Running   0          29m
fluentd-tpgrf                                   1/1     Running   0          29m
fluentd-v4tv5                                   1/1     Running   0          29m
kibana-855d757cbd-swg79                         2/2     Running   2          29m

Events:
  Type     Reason     Age                   From                                     Message
  ----     ------     ----                  ----                                     -------
  Normal   Scheduled  29m                   default-scheduler                        Successfully assigned openshift-logging/elasticsearch-cdm-00ygk06t-2-56cf7dffc-dhgp9 to worker-1.test-4604.example.com
  Normal   Pulled     29m                   kubelet, worker-1.test-4604.example.com  Container image "registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:2940a8ce2837ee02afcf5de485d8e7eb3584c9ce56602c2625efe5603d53a1b2" already present on machine
  Normal   Created    29m                   kubelet, worker-1.test-4604.example.com  Created container elasticsearch
  Normal   Started    29m                   kubelet, worker-1.test-4604.example.com  Started container elasticsearch
  Normal   Pulled     29m                   kubelet, worker-1.test-4604.example.com  Container image "registry.redhat.io/openshift4/ose-oauth-proxy@sha256:03289a1d986efec545ac68c9bd8839a3a1ef0ad4c5a082d4c392cd27c5143b21" already present on machine
  Normal   Created    29m                   kubelet, worker-1.test-4604.example.com  Created container proxy
  Normal   Started    29m                   kubelet, worker-1.test-4604.example.com  Started container proxy
  Warning  Unhealthy  4m2s (x299 over 28m)  kubelet, worker-1.test-4604.example.com  Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]

Comment 6 Dan Li 2020-07-14 12:49:22 UTC
Bug #1855072 is still WIP, therefore, this bug could potentially be blocked by it.

Comment 7 Dan Li 2020-07-21 15:10:56 UTC
The blocking bug 1855072 is still WIP. Therefore, it is unlikely that the fix for this bug will be in the current sprint before August 1. Adding UpcomingSprint tag

Comment 8 Dan Li 2020-07-21 19:16:52 UTC
Setting the Target Release to match the blocking bug (4.3.z)

Comment 9 Dan Li 2020-08-06 17:24:40 UTC
Hi Archana, could you re-test this bug to confirm that it is still an issue in 4.3? The blocking bug has been resolved.

Comment 10 Christy Norman 2020-08-10 18:29:45 UTC
This has been re-tested and the issue seems to have disappeared. However, afaik, we never saw any seccomp errors. I'll let Archana correct me if I'm mistaken. *I* didn't see any of those when I looked at her setup. :)

Comment 11 Archana Prabhakar 2020-08-11 05:46:56 UTC
We tested this on 4.3.32 build number 4.3.0-0.nightly-ppc64le-2020-07-25-094111
It is working fine. No other issues seen. We can close this bug.

Comment 12 Archana Prabhakar 2020-08-11 05:48:45 UTC
# oc get clusterversion
NAME      VERSION                                     AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-ppc64le-2020-07-25-094111   True        False         12h     Cluster version is 4.3.0-0.nightly-ppc64le-2020-07-25-094111

# oc get csv
NAME                                            DISPLAY                  VERSION                  REPLACES   PHASE
clusterlogging.4.3.31-202007272153.p0           Cluster Logging          4.3.31-202007272153.p0              Succeeded
elasticsearch-operator.4.3.31-202007272153.p0   Elasticsearch Operator   4.3.31-202007272153.p0              Succeeded

Comment 13 Dan Li 2020-08-11 21:34:39 UTC
Closing per feedback.