Bug 1868916
Summary: | Elasticsearch pod can't start on s390x clusters | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nicolas Nosenzo <nnosenzo> |
Component: | Multi-Arch | Assignee: | Dennis Gilmore <dgilmore> |
Status: | CLOSED ERRATA | QA Contact: | Jeremy Poulin <jpoulin> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.5 | CC: | aos-bugs, cbaus, dahernan, danili, dgilmore, dorzel, dslavens, erich, hjochman, jjanz, lakshmi.ravichandran1, nstielau, openshift-bugs-escalate, peterm, qitang, raja.sekhar, rcarrier, sabeher2, yselkowi |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | s390x | ||
OS: | Unspecified | ||
Whiteboard: | Logging | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:28:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Nicolas Nosenzo
2020-08-14 10:25:19 UTC
Can you provide steps to reproduce this? I have the elastic operator running on 4.5.6 s390x. I also have cluster logging running on 4.5.6 s390x and it seems both are having trouble here in log snippets you provided. Makes me wonder if there is something else going on to cause this. To clarify both operators successfully deploy in 4.5.6 s390x. The reference to "both having trouble" are the log snippets submitted on this BZ. Hi Carvel, Nicolas created the bug from a ticker we have opened. Steps to reproduche. 1) Setup an OCP CLuster on Z-Linux with 3 Master Nodes and 3 Worker Nodes 2) Follow the steps outlined at https://docs.openshift.com/container-platform/4.5/logging/cluster-logging-deploying.html#cluster-logging-deploy-cli_cluster-logging-deploying 3) As we are short on memory and for easier debugging I changed the clo-instance.yaml in step 5: apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: application: maxAge: 1d infra: maxAge: 7d audit: maxAge: 7d elasticsearch: nodeCount: 1 resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: size: 200G redundancyPolicy: "ZeroRedundancy" visualization: type: "kibana" kibana: replicas: 1 curation: type: "curator" curator: schedule: "30 3 * * *" collection: logs: type: "fluentd" fluentd: {} 4) wait for the pods to start and I see the following: [hjochmann@oc4500535505 ~]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-7b57c54b69-h4c9l 1/1 Running 0 4d15h elasticsearch-cdm-s5brzwre-1-566b9897d-l8g4v 1/2 Running 0 4d15h fluentd-2n74x 0/1 Init:CrashLoopBackOff 1094 4d15h fluentd-62sq6 0/1 Init:CrashLoopBackOff 1092 4d15h fluentd-ff82f 0/1 Init:CrashLoopBackOff 1094 4d15h fluentd-hwb58 0/1 Init:CrashLoopBackOff 1093 4d15h fluentd-mb9cs 0/1 Init:CrashLoopBackOff 1093 4d15h fluentd-rvfjc 0/1 Init:CrashLoopBackOff 1093 4d15h kibana-78b865b58b-86vzf 2/2 Running 0 4d15h [hjochmann@oc4500535505 ~]$ The elasticsearch pod is only getting 1/2 ready as a result all fuentd pods go into CrashLoopBack after a short time. here is the output of the elasticsearch pod log when it is starting up until the first error. then the ssl error repeats endless. [2020-08-13 15:13:26,723][INFO ][container.run ] Begin Elasticsearch startup script [2020-08-13 15:13:26,728][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch... [2020-08-13 15:13:26,731][INFO ][container.run ] Inspecting the maximum RAM available... [2020-08-13 15:13:26,734][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m' [2020-08-13 15:13:26,735][INFO ][container.run ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch//secret [2020-08-13 15:13:26,756][INFO ][container.run ] Building required jks files and truststore Importing keystore /etc/elasticsearch//secret/admin.p12 to /etc/elasticsearch//secret/admin.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/admin.jks -destkeystore /etc/elasticsearch//secret/admin.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch//secret/elasticsearch.p12 to /etc/elasticsearch//secret/elasticsearch.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/elasticsearch.jks -destkeystore /etc/elasticsearch//secret/elasticsearch.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch//secret/logging-es.p12 to /etc/elasticsearch//secret/logging-es.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch//secret/logging-es.jks -destkeystore /etc/elasticsearch//secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Certificate was added to keystore [2020-08-13 15:13:31,000][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof [2020-08-13 15:13:31,001][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Xloggc:/elasticsearch/persistent/elasticsearch/logs/gc.log -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log' [2020-08-13 15:13:31,002][INFO ][container.run ] Checking if Elasticsearch is ready OpenJDK 64-Bit Zero VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N [2020-08-13T15:13:42,577][WARN ][o.e.c.l.LogConfigurator ] [elasticsearch-cdm-s5brzwre-1] Some logging configurations have %marker but don't have %node_name. We will automatically add %node_name to the pattern to ease the migration for users who customize log4j2.properties but will stop this behavior in 7.0. You should manually replace `%node_name` with `[%node_name]%marker ` in these locations: /etc/elasticsearch/log4j2.properties [2020-08-13T15:13:43,823][INFO ][o.e.e.NodeEnvironment ] [elasticsearch-cdm-s5brzwre-1] using [1] data paths, mounts [[/elasticsearch/persistent (/dev/mapper/coreos-luks-root-nocrypt)]], net usable_space [14.9gb], net total_space [44.5gb], types [xfs] [2020-08-13T15:13:43,824][INFO ][o.e.e.NodeEnvironment ] [elasticsearch-cdm-s5brzwre-1] heap size [1021.9mb], compressed ordinary object pointers [false] [2020-08-13T15:13:43,828][INFO ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] node name [elasticsearch-cdm-s5brzwre-1], node ID [mE9KzlVrSmWZDLG2gJoKdA] [2020-08-13T15:13:43,828][INFO ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] version[6.8.1-SNAPSHOT], pid[1], build[oss/zip/Unknown/Unknown], OS[Linux/4.18.0-193.12.1.el8_2.s390x/s390x], JVM[Oracle Corporation/OpenJDK 64-Bit Zero VM/1.8.0_262/25.262-b10] [2020-08-13T15:13:43,829][INFO ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] JVM arguments [-XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-2113324547621836264, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -XX:+UnlockExperimentalVMOptions, -XX:+UseCGroupMemoryLimitForHeap, -XX:MaxRAMFraction=2, -XX:InitialRAMFraction=2, -XX:MinRAMFraction=2, -Xms1024m, -Xmx1024m, -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof, -Xloggc:/elasticsearch/persistent/elasticsearch/logs/gc.log, -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log, -Djdk.tls.ephemeralDHKeySize=2048, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=oss, -Des.distribution.type=zip] [2020-08-13T15:13:43,829][WARN ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] version [6.8.1-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production [2020-08-13T15:13:55,932][INFO ][o.e.p.p.PrometheusExporterPlugin] [elasticsearch-cdm-s5brzwre-1] starting Prometheus exporter plugin [2020-08-13T15:13:58,900][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] ES Config path is /etc/elasticsearch [2020-08-13T15:13:59,269][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] OpenSSL not available (this is not an error, we simply fallback to built-in JDK SSL) because of java.lang.ClassNotFoundException: io.netty.internal.tcnative.SSL [2020-08-13T15:13:59,695][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] Config directory is /etc/elasticsearch/, from there the key- and truststore files are resolved relatively [2020-08-13T15:13:59,714][INFO ][c.a.o.s.s.u.SSLCertificateHelper] [elasticsearch-cdm-s5brzwre-1] No alias given, use the first one: elasticsearch [2020-08-13T15:13:59,795][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] HTTPS client auth mode OPTIONAL [2020-08-13T15:13:59,798][INFO ][c.a.o.s.s.u.SSLCertificateHelper] [elasticsearch-cdm-s5brzwre-1] No alias given, use the first one: logging-es [2020-08-13T15:13:59,814][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] TLS Transport Client Provider : JDK [2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] TLS Transport Server Provider : JDK [2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] TLS HTTP Provider : JDK [2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] Enabled TLS protocols for transport layer : [TLSv1.1, TLSv1.2] [2020-08-13T15:13:59,815][INFO ][c.a.o.s.s.DefaultOpenDistroSecurityKeyStore] [elasticsearch-cdm-s5brzwre-1] Enabled TLS protocols for HTTP layer : [TLSv1.1, TLSv1.2] [2020-08-13T15:14:08,624][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Clustername: elasticsearch [2020-08-13T15:14:08,811][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Directory /etc/elasticsearch has insecure file permissions (should be 0700) [2020-08-13T15:14:08,812][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Directory /etc/elasticsearch/scripts has insecure file permissions (should be 0700) [2020-08-13T15:14:08,813][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Directory /etc/elasticsearch/secret has insecure file permissions (should be 0700) [2020-08-13T15:14:08,813][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/admin.p12 has insecure file permissions (should be 0600) [2020-08-13T15:14:08,814][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/admin.jks has insecure file permissions (should be 0600) [2020-08-13T15:14:08,815][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/elasticsearch.p12 has insecure file permissions (should be 0600) [2020-08-13T15:14:08,815][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/searchguard.key has insecure file permissions (should be 0600) [2020-08-13T15:14:08,816][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/logging-es.p12 has insecure file permissions (should be 0600) [2020-08-13T15:14:08,816][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/key has insecure file permissions (should be 0600) [2020-08-13T15:14:08,817][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/truststore has insecure file permissions (should be 0600) [2020-08-13T15:14:08,817][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/secret/searchguard.truststore has insecure file permissions (should be 0600) [2020-08-13T15:14:08,818][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] File /etc/elasticsearch/index_settings has insecure file permissions (should be 0600) [2020-08-13T15:14:09,184][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [aggs-matrix-stats] [2020-08-13T15:14:09,184][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [analysis-common] [2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [ingest-common] [2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [ingest-user-agent] [2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [lang-expression] [2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [lang-mustache] [2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [lang-painless] [2020-08-13T15:14:09,185][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [mapper-extras] [2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [parent-join] [2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [percolator] [2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [rank-eval] [2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [reindex] [2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [repository-url] [2020-08-13T15:14:09,186][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [transport-netty4] [2020-08-13T15:14:09,187][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded module [tribe] [2020-08-13T15:14:09,187][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded plugin [opendistro_security] [2020-08-13T15:14:09,188][INFO ][o.e.p.PluginsService ] [elasticsearch-cdm-s5brzwre-1] loaded plugin [prometheus-exporter] [2020-08-13T15:14:09,284][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] Disabled https compression by default to mitigate BREACH attacks. You can enable it by setting 'http.compression: true' in elasticsearch.yml [2020-08-13T15:15:04,055][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured categories on rest layer to ignore: [AUTHENTICATED, GRANTED_PRIVILEGES] [2020-08-13T15:15:04,056][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured categories on transport layer to ignore: [AUTHENTICATED, GRANTED_PRIVILEGES] [2020-08-13T15:15:04,056][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured Users to ignore: [kibanaserver] [2020-08-13T15:15:04,057][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured Users to ignore for read compliance events: [kibanaserver] [2020-08-13T15:15:04,057][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Configured Users to ignore for write compliance events: [kibanaserver] [2020-08-13T15:15:04,094][ERROR][c.a.o.s.a.s.SinkProvider ] [elasticsearch-cdm-s5brzwre-1] Default endpoint could not be created, auditlog will not work properly. [2020-08-13T15:15:04,105][WARN ][c.a.o.s.a.r.AuditMessageRouter] [elasticsearch-cdm-s5brzwre-1] No default storage available, audit log may not work properly. Please check configuration. [2020-08-13T15:15:04,106][INFO ][c.a.o.s.a.i.AuditLogImpl ] [elasticsearch-cdm-s5brzwre-1] Message routing enabled: false [2020-08-13T15:15:04,139][WARN ][c.a.o.s.c.ComplianceConfig] [elasticsearch-cdm-s5brzwre-1] If you plan to use field masking pls configure opendistro_security.compliance.salt to be a random string of 16 chars length identical on all nodes [2020-08-13T15:15:04,139][INFO ][c.a.o.s.c.ComplianceConfig] [elasticsearch-cdm-s5brzwre-1] PII configuration [auditLogPattern=null, auditLogIndex=null]: {} [2020-08-13T15:15:04,909][DEBUG][o.e.a.ActionModule ] [elasticsearch-cdm-s5brzwre-1] Using REST wrapper from plugin com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin [2020-08-13T15:15:05,285][INFO ][o.e.d.DiscoveryModule ] [elasticsearch-cdm-s5brzwre-1] using discovery type [zen] and host providers [settings] Registering Handler [2020-08-13T15:15:08,210][INFO ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] initialized [2020-08-13T15:15:08,211][INFO ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] starting ... [2020-08-13T15:15:08,905][INFO ][o.e.t.TransportService ] [elasticsearch-cdm-s5brzwre-1] publish_address {10.129.2.27:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}, {10.129.2.27:9300} [2020-08-13T15:15:08,940][INFO ][o.e.b.BootstrapChecks ] [elasticsearch-cdm-s5brzwre-1] bound or publishing to a non-loopback address, enforcing bootstrap checks [2020-08-13T15:15:08,962][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-s5brzwre-1] Check if .security index exists ... [2020-08-13T15:15:09,004][DEBUG][o.e.a.a.i.e.i.TransportIndicesExistsAction] [elasticsearch-cdm-s5brzwre-1] no known master node, scheduling a retry [2020-08-13T15:15:14,029][WARN ][o.e.d.z.UnicastZenPing ] [elasticsearch-cdm-s5brzwre-1] timed out after [5s] resolving host [elasticsearch-cluster.openshift-logging.svc] [2020-08-13T15:15:17,104][INFO ][o.e.c.s.MasterService ] [elasticsearch-cdm-s5brzwre-1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {elasticsearch-cdm-s5brzwre-1}{mE9KzlVrSmWZDLG2gJoKdA}{4s72Y-VSQLqLAQAoF8zqjQ}{10.129.2.27}{10.129.2.27:9300} [2020-08-13T15:15:17,130][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-cdm-s5brzwre-1] new_master {elasticsearch-cdm-s5brzwre-1}{mE9KzlVrSmWZDLG2gJoKdA}{4s72Y-VSQLqLAQAoF8zqjQ}{10.129.2.27}{10.129.2.27:9300}, reason: apply cluster state (from master [master {elasticsearch-cdm-s5brzwre-1}{mE9KzlVrSmWZDLG2gJoKdA}{4s72Y-VSQLqLAQAoF8zqjQ}{10.129.2.27}{10.129.2.27:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]]) [2020-08-13T15:15:17,213][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-s5brzwre-1] .security index does not exist yet, use either securityadmin to initialize cluster or wait until cluster is fully formed and up [2020-08-13T15:15:17,232][INFO ][o.e.g.GatewayService ] [elasticsearch-cdm-s5brzwre-1] recovered [0] indices into cluster_state [2020-08-13T15:15:17,237][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch-cdm-s5brzwre-1] publish_address {10.129.2.27:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {10.129.2.27:9200} [2020-08-13T15:15:17,237][INFO ][o.e.n.Node ] [elasticsearch-cdm-s5brzwre-1] started [2020-08-13T15:15:17,238][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-s5brzwre-1] 4 Open Distro Security modules loaded so far: [Module [type=AUDITLOG, implementing class=com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl], Module [type=REST_MANAGEMENT_API, implementing class=com.amazon.opendistroforelasticsearch.security.dlic.rest.api.OpenDistroSecurityRestApiActions], Module [type=DLSFLS, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.OpenDistroSecurityFlsDlsIndexSearcherWrapper], Module [type=MULTITENANCY, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.PrivilegesInterceptorImpl]] [2020-08-13 15:15:20,598][INFO ][container.run ] Elasticsearch is ready and listening /usr/share/elasticsearch/init ~ [2020-08-13 15:15:20,604][INFO ][container.run ] Starting init script: 0001-jaeger [2020-08-13 15:15:20,605][INFO ][container.run ] Completed init script: 0001-jaeger [2020-08-13 15:15:20,646][INFO ][container.run ] Forcing the seeding of ACL documents [2020-08-13 15:15:25,434][INFO ][container.run ] Seeding the security ACL index. Will wait up to 604800 seconds. [2020-08-13 15:15:25,435][INFO ][container.run ] Seeding the security ACL index. Will wait up to 604800 seconds. Open Distro Security Admin v6 Will connect to localhost:9300 ... done Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{FGbqJDqUTJ-d1qp_JzrBzg}{localhost}{127.0.0.1:9300}] ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information Trace: NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{FGbqJDqUTJ-d1qp_JzrBzg}{localhost}{127.0.0.1:9300}]] at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248) at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60) at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133) 0 [2020-08-13 15:16:22,752][WARN ][container.run ] Error seeding the security ACL index... retrying in 10 seconds - 0 retries so far [2020-08-13 15:16:22,753][WARN ][container.run ] Seeding will continue to fail until the cluster status is YELLOW [2020-08-13 15:16:27,774][INFO ][container.run ] Remaining red indices: 0 Open Distro Security Admin v6 Will connect to localhost:9300 ... done Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{uEUFqM1JSOqr95TFpuNtzg}{localhost}{127.0.0.1:9300}] ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information Trace: NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{uEUFqM1JSOqr95TFpuNtzg}{localhost}{127.0.0.1:9300}]] at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248) at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60) at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133) 1 Open Distro Security Admin v6 Will connect to localhost:9300 ... done Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{qdSmx88dStq0JBwahTtnag}{localhost}{127.0.0.1:9300}] ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information Trace: NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{qdSmx88dStq0JBwahTtnag}{localhost}{127.0.0.1:9300}]] at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248) at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60) at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133) 2 Open Distro Security Admin v6 Will connect to localhost:9300 ... done Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{y5MgN7ulRPeDpKvbxJcVzA}{localhost}{127.0.0.1:9300}] ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information Trace: NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{y5MgN7ulRPeDpKvbxJcVzA}{localhost}{127.0.0.1:9300}]] at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248) at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:60) at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:388) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:391) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.execute(OpenDistroSecurityAdmin.java:468) at com.amazon.opendistroforelasticsearch.security.tools.OpenDistroSecurityAdmin.main(OpenDistroSecurityAdmin.java:133) 3 [2020-08-13T15:19:11,607][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-s5brzwre-1] SSL Problem Received close_notify during handshake javax.net.ssl.SSLException: Received close_notify during handshake at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?] at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1777) ~[?:?] at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1090) ~[?:?] at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:913) ~[?:?] at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:783) ~[?:?] at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:1.8.0_262] at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] [2020-08-13T15:19:11,689][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-s5brzwre-1] SSL Problem Received close_notify during handshake javax.net.ssl.SSLException: Received close_notify during handshake at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?] I try to attache a longer version of the log. Regards Hanspeter Created attachment 1711684 [details]
Log output from elasticsearch pod with errors for 4.5.5 on s390x
Hi Hanspeter, from the ClusterLogging instance, you're probably not meeting the conditions for ES memory """ Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and limits """ Hi Nicolas, I know that but we do not facing an out of memory error and this is only a minimum instance for first test with the customer. As soon we are going into real use the memory will be enhanced. Hanspeter Hi I did a quick crosscheck with the cluster lgging instace settings on 4.2 on x86 (diden't have a 4.5.x cluster on s86 available) and there it is working whit the small memory food print for testing. That was the settings used clo-instance-4.2.yaml: apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" elasticsearch: nodeCount: 1 resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: # storageClassName: "dummy" size: 200G redundancyPolicy: "ZeroRedundancy" visualization: type: "kibana" kibana: replicas: 1 curation: type: "curator" curator: schedule: "30 3 * * *" collection: logs: type: "fluentd" fluentd: {} [hjochmann@oc4500535505 ocp-cl]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-79f8c5496c-vn4r5 1/1 Running 0 11m elasticsearch-cdm-brwb2f9l-1-86cb8c749-tvh59 2/2 Running 0 2m49s fluentd-gxb82 1/1 Running 0 2m49s fluentd-lvgm2 1/1 Running 0 2m49s fluentd-ndb78 1/1 Running 0 2m49s fluentd-r7xnc 1/1 Running 0 2m49s fluentd-srqhv 1/1 Running 0 2m49s kibana-848f59b66f-kqqpd 2/2 Running 0 2m49s I will attache the log Created attachment 1711698 [details]
Log output of working elasticsearch pod on 4.2 on x86
The same would be expected for 4.5 on s390x
I would say the proper way to test this stuff is setting the suggested memory in s390x arch and not testing the operator with the small (not recommended) chunk of mem across different platforms. The jvm may not behave the same across all the archs. Cross checked with bigger memory. Problem is still the same: Unable to check whether cluster is sane: None of the configured nodes are available: [{#transport#-1}{9rMmCXGbQPGCKo7zvc28Hg}{localhost}{127.0.0.1:9300}] [2020-08-19T06:32:40,137][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-bue4eb3a-1] SSL Problem Received close_notify during handshake javax.net.ssl.SSLException: Received close_notify during handshake at sun.security.ssl.Alerts.getSSLException(Alerts.java:214) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1667) ~[?:?] at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1635) ~[?:?] at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1777) ~[?:?] at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1090) ~[?:?] at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:913) ~[?:?] at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:783) ~[?:?] at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:626) ~[?:1.8.0_262] at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] I attache the log and the the yaml of the logging instance output. Hanspeter Created attachment 1711804 [details]
Log tested with default memory settings
Created attachment 1711805 [details]
yaml of the current clusterlogging instance used
Hi, do you have any news? Are you working of a fix for the Elasticsearch container? I did a bit of research on the error message and it looks like the JVM inside the container has a problem with the SSL lib or the certificate store. IBM internal we have a running Elasticsearch container that uses the IBM J9 JVM. I can offer to get you into contact with the team that created this Elasticsearch container. Regards Hanspeter Are you saying the Elastic search container deployed, that this bug is about, is not the one from OperatorHub as shipped with 4.5? No we use the standard OperatorHub Controllers and Pods. IBM has a Product called "IBM Cloud Platform Common Services" that we tested with the customer if it could be a workaround to use it until the Bug is fixed. It also includes a Elasticsearch based logging stack. The Elasticsearch container that is included in this stack is working on s390x. https://www.ibm.com/support/knowledgecenter/SSHKN6/kc_welcome_cs.html But the customer prefers to use the original Cluster Logging Operator, that why we need to get this fixed asap. Hanspeter We have been able to reproduce the issue on s390x and found it does not happen on ppc64le either. We are currently working on isolating the issue and toward a possible fix/solution. Will update as soon as we definitely know more. Hi, i have tried installing cluster-logging on OCP 4.5(s390x) with 3 masters and 3 workers node configuration both from Operator Hub as well as using ART operator source as suggested by Jeremy to include the fix for system_call_filter issues on s390x — also, in a increased memory setting for ES pods (allocating 16GB) in combination with local storage operator for pv/pvc provisioning . — currently, the testing of cluster-logging is blocked with this issue being faced. i have also shared error logs and screenshots for reference . Created attachment 1712499 [details]
Log output of ES pod container error for 4.5 on s390x
Created attachment 1712500 [details]
yaml definition of ClusterLogging instance
Created attachment 1712501 [details]
ES pods status screenshot
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |