Bug 1868916

Summary:

Elasticsearch pod can't start on s390x clusters

Product:

OpenShift Container Platform

Reporter:

Nicolas Nosenzo <nnosenzo>

Component:

Multi-Arch

Assignee:

Dennis Gilmore <dgilmore>

Status:

CLOSED ERRATA

QA Contact:

Jeremy Poulin <jpoulin>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.5

CC:

aos-bugs, cbaus, dahernan, danili, dgilmore, dorzel, dslavens, erich, hjochman, jjanz, lakshmi.ravichandran1, nstielau, openshift-bugs-escalate, peterm, qitang, raja.sekhar, rcarrier, sabeher2, yselkowi

Target Milestone:

---

Target Release:

4.6.0

Hardware:

s390x

OS:

Unspecified

Whiteboard:

Logging

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-10-27 16:28:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Log output from elasticsearch pod with errors for 4.5.5 on s390x	none
Log output of working elasticsearch pod on 4.2 on x86	none
Log tested with default memory settings	none
yaml of the current clusterlogging instance used	none
Log output of ES pod container error for 4.5 on s390x	none
yaml definition of ClusterLogging instance	none
ES pods status screenshot	none

Description Nicolas Nosenzo 2020-08-14 10:25:19 UTC

Description of problem:

Elasticserach pod can't start on s390x, the same configuration works on a x86 cluster.

NAME                                           DISPLAY                  VERSION                 REPLACES                                       PHASE
clusterlogging.4.5.0-202007311600.p0           Cluster Logging          4.5.0-202007311600.p0   clusterlogging.4.4.0-202008051553.p0           Succeeded
elasticsearch-operator.4.5.0-202007301938.p0   Elasticsearch Operator   4.5.0-202007301938.p0   elasticsearch-operator.4.4.0-202007312002.p0   Succeeded

But the results are even worth. The elasticsearch pod is not getting ready anymore .

[hjochmann@oc4500535505 ocp-cl]$ oc get pods
NAME                                           READY   STATUS     RESTARTS   AGE
cluster-logging-operator-7b57c54b69-h4c9l      1/1     Running    0          55m
elasticsearch-cdm-s5brzwre-1-566b9897d-l8g4v   1/2     Running    0          18m
fluentd-2n74x                                  0/1     Init:0/1   7          18m
fluentd-62sq6                                  0/1     Init:0/1   7          18m
fluentd-ff82f                                  0/1     Init:0/1   7          18m
fluentd-hwb58                                  0/1     Init:0/1   7          18m
fluentd-mb9cs                                  0/1     Init:0/1   7          18m
fluentd-rvfjc                                  0/1     Init:0/1   7          18m
kibana-78b865b58b-86vzf                        2/2     Running    0          17m

The logging was changed to standard container output. Some interesting bits from the log:
[2020-08-13T15:13:43,829][WARN ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] version [6.8.1-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
ans also the SSL bug is still there:
[2020-08-13T15:19:11,607][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-s5brzwre-1] SSL Problem Received close_notify during handshake
javax.net.ssl.SSLException: Received close_notify during handshake
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1777) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1090) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:913) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:783) ~[?:?]
	at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:1.8.0_262]
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]

Version-Release number of selected component (if applicable):
4.5.5

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Carvel Baus 2020-08-17 18:22:03 UTC

Can you provide steps to reproduce this?

I have the elastic operator running on 4.5.6 s390x. I also have cluster logging running on 4.5.6 s390x and it seems both are having trouble here in log snippets you provided. Makes me wonder if there is something else going on to cause this.

Comment 3 Carvel Baus 2020-08-17 18:26:19 UTC

To clarify both operators successfully deploy in 4.5.6 s390x. The reference to "both having trouble" are the log snippets submitted on this BZ.

Comment 4 Hanspeter Jochmann 2020-08-18 06:27:18 UTC

Hi Carvel,

Nicolas created the bug from a ticker we have opened. 

Steps to reproduche.
1) Setup an OCP CLuster on Z-Linux with 3 Master Nodes and 3 Worker Nodes
2) Follow the steps outlined at https://docs.openshift.com/container-platform/4.5/logging/cluster-logging-deploying.html#cluster-logging-deploy-cli_cluster-logging-deploying
3) As we are short on memory and for easier debugging I changed the clo-instance.yaml in step 5:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  logStore:
    type: "elasticsearch"  
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 1 
      resources:
        limits:
          memory: 2Gi
        requests:
          cpu: 200m
          memory: 2Gi
      storage:
        size: 200G
      redundancyPolicy: "ZeroRedundancy"
  visualization:
    type: "kibana"  
    kibana:
      replicas: 1
  curation:
    type: "curator"
    curator:
      schedule: "30 3 * * *" 
  collection:
    logs:
      type: "fluentd"  
      fluentd: {}

4) wait for the pods to start and I see the following:
[hjochmann@oc4500535505 ~]$ oc get pods
NAME                                           READY   STATUS                  RESTARTS   AGE
cluster-logging-operator-7b57c54b69-h4c9l      1/1     Running                 0          4d15h
elasticsearch-cdm-s5brzwre-1-566b9897d-l8g4v   1/2     Running                 0          4d15h
fluentd-2n74x                                  0/1     Init:CrashLoopBackOff   1094       4d15h
fluentd-62sq6                                  0/1     Init:CrashLoopBackOff   1092       4d15h
fluentd-ff82f                                  0/1     Init:CrashLoopBackOff   1094       4d15h
fluentd-hwb58                                  0/1     Init:CrashLoopBackOff   1093       4d15h
fluentd-mb9cs                                  0/1     Init:CrashLoopBackOff   1093       4d15h
fluentd-rvfjc                                  0/1     Init:CrashLoopBackOff   1093       4d15h
kibana-78b865b58b-86vzf                        2/2     Running                 0          4d15h
[hjochmann@oc4500535505 ~]$ 

The elasticsearch pod is only getting 1/2 ready as a result all fuentd pods go into CrashLoopBack after a short time.

here is the output of the elasticsearch pod log when it is starting up until the first error. then the ssl error repeats endless.
[2020-08-13 15:13:26,723][INFO ][container.run            ] Begin Elasticsearch startup script
[2020-08-13 15:13:26,728][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2020-08-13 15:13:26,731][INFO ][container.run            ] Inspecting the maximum RAM available...
[2020-08-13 15:13:26,734][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m'
[2020-08-13 15:13:26,735][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch//secret
[2020-08-13 15:13:26,756][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch//secret/admin.p12 to /etc/elasticsearch//secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch//secret/elasticsearch.p12 to /etc/elasticsearch//secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch//secret/logging-es.p12 to /etc/elasticsearch//secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2020-08-13 15:13:31,000][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2020-08-13 15:13:31,001][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Xloggc:/elasticsearch/persistent/elasticsearch/logs/gc.log -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log'
[2020-08-13 15:13:31,002][INFO ][container.run            ] Checking if Elasticsearch is ready
OpenJDK 64-Bit Zero VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
[2020-08-13T15:13:42,577][WARN ][o.e.c.l.LogConfigurator  ] [elasticsearch-cdm-s5brzwre-1] Some logging configurations have %marker but don't have %node_name. We will automatically add %node_name to the pattern to ease the migration for users who customize log4j2.properties but will stop this behavior in 7.0. You should manually replace `%node_name` with `[%node_name]%marker ` in these locations:
  /etc/elasticsearch/log4j2.properties
[2020-08-13T15:13:43,823][INFO ][o.e.e.NodeEnvironment    ] [elasticsearch-cdm-s5brzwre-1] using [1] data paths, mounts [[/elasticsearch/persistent (/dev/mapper/coreos-luks-root-nocrypt)]], net usable_space [14.9gb], net total_space [44.5gb], types [xfs]
[2020-08-13T15:13:43,824][INFO ][o.e.e.NodeEnvironment    ] [elasticsearch-cdm-s5brzwre-1] heap size [1021.9mb], compressed ordinary object pointers [false]
[2020-08-13T15:13:43,828][INFO ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] node name [elasticsearch-cdm-s5brzwre-1], node ID [mE9KzlVrSmWZDLG2gJoKdA]
[2020-08-13T15:13:43,828][INFO ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] version[6.8.1-SNAPSHOT], pid[1], build[oss/zip/Unknown/Unknown], OS[Linux/4.18.0-193.12.1.el8_2.s390x/s390x], JVM[Oracle Corporation/OpenJDK 64-Bit Zero VM/1.8.0_262/25.262-b10]
[2020-08-13T15:13:43,829][INFO ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] JVM arguments [-XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-2113324547621836264, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -XX:+UnlockExperimentalVMOptions, -XX:+UseCGroupMemoryLimitForHeap, -XX:MaxRAMFraction=2, -XX:InitialRAMFraction=2, -XX:MinRAMFraction=2, -Xms1024m, -Xmx1024m, -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof, -Xloggc:/elasticsearch/persistent/elasticsearch/logs/gc.log, -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log, -Djdk.tls.ephemeralDHKeySize=2048, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=oss, -Des.distribution.type=zip]
[2020-08-13T15:13:43,829][WARN ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] version [6.8.1-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
[2020-08-13T15:13:55,932][INFO ][o.e.p.p.PrometheusExporterPlugin] [elasticsearch-cdm-s5brzwre-1] starting Prometheus exporter plugin
[2020-08-13T15:13:58,900][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] ES Config path is /etc/elasticsearch
[2020-08-13T15:13:59,269][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] OpenSSL not available (this is not an error, we simply fallback to built-in JDK SSL) because of java.lang.ClassNotFoundException: io.netty.internal.tcnative.SSL
[2020-08-13T15:13:59,695][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] Config directory is /etc/elasticsearch/, from there the key- and truststore files are resolved relatively
[2020-08-13T15:13:59,714][INFO ][c.a.o.s.s.u.SSLCertificateHelper] [elasticsearch-cdm-s5brzwre-1] No alias given, use the first one: elasticsearch
[2020-08-13T15:13:59,795][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] HTTPS client auth mode OPTIONAL
[2020-08-13T15:13:59,798][INFO ][c.a.o.s.s.u.SSLCertificateHelper] [elasticsearch-cdm-s5brzwre-1] No alias given, use the first one: logging-es
[2020-08-13T15:13:59,814][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] TLS Transport Client Provider : JDK
[2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] TLS Transport Server Provider : JDK
[2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] TLS HTTP Provider             : JDK
[2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] Enabled TLS protocols for transport layer : [TLSv1.1, TLSv1.2]
[2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] Enabled TLS protocols for HTTP layer      : [TLSv1.1, TLSv1.2]
[2020-08-13T15:14:08,624][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Clustername: elasticsearch
[2020-08-13T15:14:08,811][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Directory /etc/elasticsearch has insecure file permissions (should be 0700)
[2020-08-13T15:14:08,812][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Directory /etc/elasticsearch/scripts has insecure file permissions (should be 0700)
[2020-08-13T15:14:08,813][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Directory /etc/elasticsearch/secret has insecure file permissions (should be 0700)
[2020-08-13T15:14:08,813][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/admin.p12 has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,814][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/admin.jks has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,815][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/elasticsearch.p12 has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,815][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/searchguard.key has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,816][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/logging-es.p12 has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,816][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/key has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,817][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/truststore has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,817][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/searchguard.truststore has insecure file permissions (should be 0600)
[2020-08-13T15:14:08,818][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/index_settings has insecure file permissions (should be 0600)
[2020-08-13T15:14:09,184][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [aggs-matrix-stats]
[2020-08-13T15:14:09,184][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [analysis-common]
[2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [ingest-common]
[2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [ingest-user-agent]
[2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [lang-expression]
[2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [lang-mustache]
[2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [lang-painless]
[2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [mapper-extras]
[2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [parent-join]
[2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [percolator]
[2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [rank-eval]
[2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [reindex]
[2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [repository-url]
[2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [transport-netty4]
[2020-08-13T15:14:09,187][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded module [tribe]
[2020-08-13T15:14:09,187][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded plugin [opendistro_security]
[2020-08-13T15:14:09,188][INFO ][o.e.p.PluginsService     ] [elasticsearch-cdm-s5brzwre-1] loaded plugin [prometheus-exporter]
[2020-08-13T15:14:09,284][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in elasticsearch.yml
[2020-08-13T15:15:04,055][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured categories on rest layer to ignore: [AUTHENTICATED, GRANTED_PRIVILEGES]
[2020-08-13T15:15:04,056][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured categories on transport layer to ignore: [AUTHENTICATED, GRANTED_PRIVILEGES]
[2020-08-13T15:15:04,056][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured Users to ignore: [kibanaserver]
[2020-08-13T15:15:04,057][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured Users to ignore for read compliance events: [kibanaserver]
[2020-08-13T15:15:04,057][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured Users to ignore for write compliance events: [kibanaserver]
[2020-08-13T15:15:04,094][ERROR][c.a.o.s.a.s.SinkProvider ] [elasticsearch-cdm-s5brzwre-1] Default endpoint could not be created, auditlog will not work properly.
[2020-08-13T15:15:04,105][WARN ][c.a.o.s.a.r.AuditMessageRouter] [elasticsearch-cdm-s5brzwre-1] No default storage available, audit log may not work properly. Please check configuration.
[2020-08-13T15:15:04,106][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Message routing enabled: false
[2020-08-13T15:15:04,139][WARN ][c.a.o.s.c.ComplianceConfig] [elasticsearch-cdm-s5brzwre-1] If you plan to use field masking pls configure opendistro_security.compliance.salt to be a random string of 16 chars length identical on all nodes
[2020-08-13T15:15:04,139][INFO ][c.a.o.s.c.ComplianceConfig] [elasticsearch-cdm-s5brzwre-1] PII configuration [auditLogPattern=null,  auditLogIndex=null]: {}
[2020-08-13T15:15:04,909][DEBUG][o.e.a.ActionModule       ] [elasticsearch-cdm-s5brzwre-1] Using REST wrapper from plugin com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin
[2020-08-13T15:15:05,285][INFO ][o.e.d.DiscoveryModule    ] [elasticsearch-cdm-s5brzwre-1] using discovery type [zen] and host providers [settings]
Registering Handler
[2020-08-13T15:15:08,210][INFO ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] initialized
[2020-08-13T15:15:08,211][INFO ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] starting ...
[2020-08-13T15:15:08,905][INFO ][o.e.t.TransportService   ] [elasticsearch-cdm-s5brzwre-1] publish_address {10.129.2.27:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}, {10.129.2.27:9300}
[2020-08-13T15:15:08,940][INFO ][o.e.b.BootstrapChecks    ] [elasticsearch-cdm-s5brzwre-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-08-13T15:15:08,962][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-s5brzwre-1] Check if .security index exists ...
[2020-08-13T15:15:09,004][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [elasticsearch-cdm-s5brzwre-1] no known master node, scheduling a retry
[2020-08-13T15:15:14,029][WARN ][o.e.d.z.UnicastZenPing   ] [elasticsearch-cdm-s5brzwre-1] timed out after [5s] resolving host [elasticsearch-cluster.openshift-logging.svc]
[2020-08-13T15:15:17,104][INFO ][o.e.c.s.MasterService    ] [elasticsearch-cdm-s5brzwre-1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {elasticsearch-cdm-s5brzwre-1}{mE9KzlVrSmWZDLG2gJoKdA}{4s72Y-VSQLqLAQAoF8zqjQ}{10.129.2.27}{10.129.2.27:9300}
[2020-08-13T15:15:17,130][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-cdm-s5brzwre-1] new_master {elasticsearch-cdm-s5brzwre-1}{mE9KzlVrSmWZDLG2gJoKdA}{4s72Y-VSQLqLAQAoF8zqjQ}{10.129.2.27}{10.129.2.27:9300}, reason: apply cluster state (from master [master {elasticsearch-cdm-s5brzwre-1}{mE9KzlVrSmWZDLG2gJoKdA}{4s72Y-VSQLqLAQAoF8zqjQ}{10.129.2.27}{10.129.2.27:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2020-08-13T15:15:17,213][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-s5brzwre-1] .security index does not exist yet, use either securityadmin to initialize cluster or wait until cluster is fully formed and up
[2020-08-13T15:15:17,232][INFO ][o.e.g.GatewayService     ] [elasticsearch-cdm-s5brzwre-1] recovered [0] indices into cluster_state
[2020-08-13T15:15:17,237][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch-cdm-s5brzwre-1] publish_address {10.129.2.27:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {10.129.2.27:9200}
[2020-08-13T15:15:17,237][INFO ][o.e.n.Node               ] [elasticsearch-cdm-s5brzwre-1] started
[2020-08-13T15:15:17,238][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] 4 Open Distro Security modules loaded so far: [Module [type=AUDITLOG, implementing class=com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl], Module [type=REST_MANAGEMENT_API, implementing class=com.amazon.opendistroforelasticsearch.security.dlic.rest.api.OpenDistroSecurityRestApiActions], Module [type=DLSFLS, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.OpenDistroSecurityFlsDlsIndexSearcherWrapper], Module [type=MULTITENANCY, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.PrivilegesInterceptorImpl]]
[2020-08-13 15:15:20,598][INFO ][container.run            ] Elasticsearch is ready and listening
/usr/share/elasticsearch/init ~
[2020-08-13 15:15:20,604][INFO ][container.run            ] Starting init script: 0001-jaeger
[2020-08-13 15:15:20,605][INFO ][container.run            ] Completed init script: 0001-jaeger
[2020-08-13 15:15:20,646][INFO ][container.run            ] Forcing the seeding of ACL documents
[2020-08-13 15:15:25,434][INFO ][container.run            ] Seeding the security ACL index.  Will wait up to 604800 seconds.
[2020-08-13 15:15:25,435][INFO ][container.run            ] Seeding the security ACL index.  Will wait up to 604800 seconds.
Open Distro Security Admin v6
Will connect to localhost:9300 ... done
Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{FGbqJDqUTJ-d1qp_JzrBzg}{localhost}{127.0.0.1:9300}]
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{FGbqJDqUTJ-d1qp_JzrBzg}{localhost}{127.0.0.1:9300}]]
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
	at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
	at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60)
	at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133)


0
[2020-08-13 15:16:22,752][WARN ][container.run            ] Error seeding the security ACL index... retrying in 10 seconds - 0 retries so far
[2020-08-13 15:16:22,753][WARN ][container.run            ] Seeding will continue to fail until the cluster status is YELLOW
[2020-08-13 15:16:27,774][INFO ][container.run            ] Remaining red indices: 0
Open Distro Security Admin v6
Will connect to localhost:9300 ... done
Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{uEUFqM1JSOqr95TFpuNtzg}{localhost}{127.0.0.1:9300}]
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{uEUFqM1JSOqr95TFpuNtzg}{localhost}{127.0.0.1:9300}]]
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
	at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
	at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60)
	at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133)


1
Open Distro Security Admin v6
Will connect to localhost:9300 ... done
Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{qdSmx88dStq0JBwahTtnag}{localhost}{127.0.0.1:9300}]
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{qdSmx88dStq0JBwahTtnag}{localhost}{127.0.0.1:9300}]]
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
	at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
	at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60)
	at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133)


2
Open Distro Security Admin v6
Will connect to localhost:9300 ... done
Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{y5MgN7ulRPeDpKvbxJcVzA}{localhost}{127.0.0.1:9300}]
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{y5MgN7ulRPeDpKvbxJcVzA}{localhost}{127.0.0.1:9300}]]
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
	at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
	at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60)
	at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468)
	at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133)


3
[2020-08-13T15:19:11,607][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-s5brzwre-1] SSL Problem Received close_notify during handshake
javax.net.ssl.SSLException: Received close_notify during handshake
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1777) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1090) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:913) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:783) ~[?:?]
	at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:1.8.0_262]
	at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
[2020-08-13T15:19:11,689][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-s5brzwre-1] SSL Problem Received close_notify during handshake
javax.net.ssl.SSLException: Received close_notify during handshake
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?]
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?]

I try to attache a longer version of the log.

Regards 
  Hanspeter

Comment 5 Hanspeter Jochmann 2020-08-18 06:29:18 UTC

Created attachment 1711684 [details]
Log output from elasticsearch pod with errors for 4.5.5 on s390x

Comment 6 Nicolas Nosenzo 2020-08-18 06:29:46 UTC

Hi Hanspeter, from the ClusterLogging instance, you're probably not meeting the conditions for ES memory

"""
Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and limits
"""

Comment 7 Hanspeter Jochmann 2020-08-18 06:40:18 UTC

Hi Nicolas, I know that but we do not facing an out of memory error and this is only a minimum instance for first test with the customer. 
As soon we are going into real use the memory will be enhanced.

Hanspeter

Comment 8 Hanspeter Jochmann 2020-08-18 07:59:38 UTC

Hi I did a quick crosscheck with the cluster lgging instace settings on 4.2 on x86 (diden't have a 4.5.x cluster on s86 available) and there it is working whit the small memory food print for testing.

That was the settings used clo-instance-4.2.yaml:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    elasticsearch:
      nodeCount: 1 
      resources:
        limits:
          memory: 2Gi
        requests:
          cpu: 200m
          memory: 2Gi
      storage:
#        storageClassName: "dummy" 
        size: 200G
      redundancyPolicy: "ZeroRedundancy"
  visualization:
    type: "kibana"  
    kibana:
      replicas: 1
  curation:
    type: "curator"  
    curator:
      schedule: "30 3 * * *"
  collection:
    logs:
      type: "fluentd"  
      fluentd: {}

[hjochmann@oc4500535505 ocp-cl]$ oc get pods
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-logging-operator-79f8c5496c-vn4r5      1/1     Running   0          11m
elasticsearch-cdm-brwb2f9l-1-86cb8c749-tvh59   2/2     Running   0          2m49s
fluentd-gxb82                                  1/1     Running   0          2m49s
fluentd-lvgm2                                  1/1     Running   0          2m49s
fluentd-ndb78                                  1/1     Running   0          2m49s
fluentd-r7xnc                                  1/1     Running   0          2m49s
fluentd-srqhv                                  1/1     Running   0          2m49s
kibana-848f59b66f-kqqpd                        2/2     Running   0          2m49s


I will attache the log

Comment 9 Hanspeter Jochmann 2020-08-18 08:01:32 UTC

Created attachment 1711698 [details]
Log output of working elasticsearch pod on 4.2 on x86

The same would be expected for 4.5 on s390x

Comment 10 Nicolas Nosenzo 2020-08-18 08:06:04 UTC

I would say the proper way to test this stuff is setting the suggested memory in s390x arch and not testing the operator with the small (not recommended) chunk of mem across different platforms. The jvm may not behave the same across all the archs.

Comment 11 Hanspeter Jochmann 2020-08-19 06:39:52 UTC

Cross checked with bigger memory. Problem is still the same:


Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{9rMmCXGbQPGCKo7zvc28Hg}{localhost}{127.0.0.1:9300}]

[2020-08-19T06:32:40,137][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-bue4eb3a-1] SSL Problem Received close_notify during handshake

javax.net.ssl.SSLException: Received close_notify during handshake

        at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?]

        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?]

        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?]

        at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1777) ~[?:?]

        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1090) ~[?:?]

        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:913) ~[?:?]

        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:783) ~[?:?]

        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:1.8.0_262]

        at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]



I attache the log and the the yaml of the logging instance output.


Hanspeter

Comment 12 Hanspeter Jochmann 2020-08-19 06:40:33 UTC

Created attachment 1711804 [details]
Log tested with default memory settings

Comment 13 Hanspeter Jochmann 2020-08-19 06:41:03 UTC

Created attachment 1711805 [details]
yaml of the current clusterlogging instance used

Comment 15 Hanspeter Jochmann 2020-08-21 12:26:39 UTC

Hi, do you have any news?

Are you working of a fix for the Elasticsearch container?

I did a bit of research on the error message and it looks like the JVM inside the container has a problem with the SSL lib or the certificate store.
IBM internal we have a running Elasticsearch container that uses the IBM J9 JVM. I can offer to get you into contact with the team that created this Elasticsearch container.

Regards
 Hanspeter

Comment 16 Carvel Baus 2020-08-21 13:19:02 UTC

Are you saying the Elastic search container deployed, that this bug is about, is not the one from OperatorHub as shipped with 4.5?

Comment 17 Hanspeter Jochmann 2020-08-21 15:36:13 UTC

No we use the standard OperatorHub Controllers and Pods.

IBM has a Product called "IBM Cloud Platform Common Services" that we tested with the customer if it could be a workaround to use it until the Bug is fixed. It also includes a Elasticsearch based logging stack. The Elasticsearch container that is included in this stack is working on s390x.
https://www.ibm.com/support/knowledgecenter/SSHKN6/kc_welcome_cs.html

But the customer prefers to use the original Cluster Logging Operator, that why we need to get this fixed asap.

Hanspeter

Comment 18 Carvel Baus 2020-08-24 23:22:13 UTC

We have been able to reproduce the issue on s390x and found it does not happen on ppc64le either. 

We are currently working on isolating the issue and toward a possible fix/solution. Will update as soon as we definitely know more.

Comment 19 Sanjaya 2020-08-25 08:30:23 UTC

Hi,
i have tried installing cluster-logging on OCP 4.5(s390x) with 3 masters and 3 workers node configuration  both from
Operator Hub
as well as using ART operator source as suggested by Jeremy to  include the fix for system_call_filter issues on s390x
— also, in a increased memory setting for ES pods (allocating 16GB) in combination with local storage operator for pv/pvc provisioning .
— currently, the testing of cluster-logging is blocked with this  issue being faced.

i have also shared error logs and screenshots for reference .

Comment 20 Sanjaya 2020-08-25 08:32:50 UTC

Created attachment 1712499 [details]
Log output of ES pod container error for 4.5 on s390x

Comment 21 Sanjaya 2020-08-25 08:36:19 UTC

Created attachment 1712500 [details]
yaml definition of ClusterLogging instance

Comment 22 Sanjaya 2020-08-25 08:39:37 UTC

Created attachment 1712501 [details]
ES pods status screenshot

Comment 34 errata-xmlrpc 2020-10-27 16:28:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196