Bug 1842412 - ServiceMesh custom installation fails on System Z(s390x )
Summary: ServiceMesh custom installation fails on System Z(s390x )
Keywords:
Status: CLOSED DUPLICATE of bug 1807201
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.3.0
Hardware: s390
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dennis Gilmore
QA Contact: Barry Donahue
URL:
Whiteboard: multi-arch
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-01 08:31 UTC by Rashmi Sakhalkar
Modified: 2020-06-12 15:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-12 15:57:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Failure_logs (10.40 KB, text/plain)
2020-06-01 08:31 UTC, Rashmi Sakhalkar
no flags Details

Description Rashmi Sakhalkar 2020-06-01 08:31:40 UTC
Created attachment 1694043 [details]
Failure_logs

Description of problem:
ServiceMesh controlpane creation is not successful due to failure in elasticsearch pod.

Version-Release number of selected component (if applicable):


How reproducible:
Create SMCP using below yaml:

apiVersion: maistra.io/v1
kind: ServiceMeshControlPlane
metadata:
  name: basic-install
  namespace: istio-system
spec:
  istio:
    gateways:
      istio-egressgateway:
        autoscaleEnabled: false
      istio-ingressgateway:
        autoscaleEnabled: false
    mixer:
      policy:
        autoscaleEnabled: false
      telemetry:
        autoscaleEnabled: false
    pilot:
      autoscaleEnabled: false
      traceSampling: 100
    kiali:
      enabled: true
    grafana:
      enabled: true
    tracing:
      enabled: true
      jaeger:
        template: production-elasticsearch
        elasticsearch:
          nodeCount: 1
          resources:
           requests:
             cpu: "1"
             memory: "1Gi"
           limits:
             memory: "4Gi"

Steps to Reproduce:
1. Create ElasticSearch, kiali, Jeager, ServiceMesh operators.
2. Create istio-system project
3. Create smcp using the above yaml.
Detailed logs of execution attached.

Actual results:
SMCP remain in Install successfull state.
Elasticsearch pod failures are seen in istio-system project

Expected results:
SMCP should be created successfully.

Additional info:

Comment 1 Rashmi Sakhalkar 2020-06-01 08:39:02 UTC
[root@localhost ocp]# ./oc version
Client Version: 4.3.19
Server Version: 4.3.19
Kubernetes Version: v1.16.2
[root@localhost ocp]#


The elasticsearch Pod failure issue is seen on System Z(s390x) and on Power(ppc64le).

Comment 2 Lukas Vlcek 2020-06-01 09:20:51 UTC
I think you are running into this:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-filter-check.html

You probably need to check ES node logs to see more details.

oc logs <pod> -c elasticsearch

Comment 3 Rashmi Sakhalkar 2020-06-01 10:36:51 UTC
Error in logs:-

ERROR: [1] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk


Output:-
[root@localhost ocp]# ./oc logs elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -c elasticsearch  -n istio-system
[2020-06-01 10:33:49,589][INFO ][container.run            ] Begin Elasticsearch startup script
[2020-06-01 10:33:49,592][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2020-06-01 10:33:49,595][INFO ][container.run            ] Inspecting the maximum RAM available...
[2020-06-01 10:33:49,602][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m'
[2020-06-01 10:33:49,603][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret
[2020-06-01 10:33:49,607][INFO ][container.run            ] Building required jks files and truststore
Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12".
Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks...
Entry for alias 1 successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore

Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12".
Certificate was added to keystore
Certificate was added to keystore
[2020-06-01 10:33:52,915][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2020-06-01 10:33:52,915][INFO ][container.run            ] Checking if Elasticsearch is ready
[2020-06-01 10:33:52,916][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled'

### LICENSE NOTICE Search Guard ###

If you use one or more of the following features in production
make sure you have a valid Search Guard license
(See https://floragunn.com/searchguard-validate-license)

* Kibana Multitenancy
* LDAP authentication/authorization
* Active Directory authentication/authorization
* REST Management API
* JSON Web Token (JWT) authentication/authorization
* Kerberos authentication/authorization
* Document- and Fieldlevel Security (DLS/FLS)
* Auditlogging

In case of any doubt mail to <sales>
###################################
Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation.

### LICENSE NOTICE Search Guard ###

If you use one or more of the following features in production
make sure you have a valid Search Guard license
(See https://floragunn.com/searchguard-validate-license)

* Kibana Multitenancy
* LDAP authentication/authorization
* Active Directory authentication/authorization
* REST Management API
* JSON Web Token (JWT) authentication/authorization
* Kerberos authentication/authorization
* Document- and Fieldlevel Security (DLS/FLS)
* Auditlogging

In case of any doubt mail to <sales>
###################################
Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
ERROR: [1] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[root@localhost ocp]#

Comment 4 Lukas Vlcek 2020-06-01 11:16:28 UTC
Sorry, I was not clear.
We need to check what is in Elasticsearch log. You can find location of ES logs following our dump script:

https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh#L291

You can see there are a few locations you need to check for logs. Or you just run this dump script and upload the archive file so that we can check.

We need to see if the log contains more details, for example this error has been reported in connection to kernel version. See:
https://discuss.elastic.co/t/how-to-get-logs-for-system-call-filters-failed-to-install/185596
https://discuss.elastic.co/t/elasticsearch-is-throwing-bootstrap-error-and-its-unable-to-load-system-call-filters-when-im-trying-to-run-on-linux/88517

Comment 5 Rashmi Sakhalkar 2020-06-01 13:14:22 UTC
I tried to execute the script but as the container is not running its failing.

Even a simple date command is not executed.

[root@localhost ocp]# ./oc exec elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -c elasticsearch date
error: Internal error occurred: error executing command in container: container is not created or running
[root@localhost ocp]# 

[root@localhost ~]# ./oc get pods
NAME                                                     READY   STATUS             RESTARTS   AGE
elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb   1/2     CrashLoopBackOff   822        3d3h
grafana-7d6b5c4ccf-r9ld8                                 2/2     Running            0          3d3h
istio-citadel-75674db4d5-kk2sl                           1/1     Running            0          3d3h
istio-egressgateway-656997ff47-gbrwg                     1/1     Running            0          3d3h
istio-galley-595f769dc5-2v7m7                            1/1     Running            0          3d3h
istio-ingressgateway-748df5748c-8d4bv                    1/1     Running            0          3d3h
istio-pilot-85c4846569-m94b4                             2/2     Running            0          3d3h
istio-policy-9c7cf98f8-4nkkm                             2/2     Running            0          3d3h
istio-sidecar-injector-74589cfb79-b2n44                  1/1     Running            0          3d3h
istio-telemetry-df4d745d5-bgkjw                          2/2     Running            0          3d3h
jaeger-collector-6c846c488c-nqz5f                        0/1     CrashLoopBackOff   903        3d3h
jaeger-es-index-cleaner-1590969300-6p9wl                 0/1     Error              0          13h
jaeger-es-index-cleaner-1590969300-jmmjh                 0/1     Error              0          13h
jaeger-es-index-cleaner-1590969300-q5fw7                 0/1     Error              0          13h
jaeger-es-index-cleaner-1590969300-rz2zk                 0/1     Error              0          13h
jaeger-es-index-cleaner-1590969300-xzvxz                 0/1     Error              0          13h
jaeger-es-index-cleaner-1590969300-z5rtb                 0/1     Error              0          13h
jaeger-query-5b75fbf477-bgw4n                            2/3     CrashLoopBackOff   905        3d3h
kiali-579f9d9fdc-lsb58                                   1/1     Running            0          3d3h
prometheus-67bfdddf9-qhqm4                               2/2     Running            0          3d3h
[root@localhost ~]#


Script output:
---- Unable to get ES logs from pod elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb

Comment 6 Rashmi Sakhalkar 2020-06-01 13:28:10 UTC
[root@localhost ~]# ./oc get pods --selector component=elasticsearch -o name
pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb


[root@localhost ~]# ./oc exec elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -- indices
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -n istio-system' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("elasticsearch")


[root@localhost ~]# ./oc describe  pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb
Name:         elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb
Namespace:    istio-system
Priority:     0
Node:         worker-1.nour-kvm-poc.zkvmocp.notld/192.168.79.25
Start Time:   Fri, 29 May 2020 05:16:21 -0400
Labels:       cluster-name=elasticsearch
              component=elasticsearch
              es-node-client=true
              es-node-data=true
              es-node-master=true
              node-name=elasticsearch-cdm-istiosystemjaeger-1
              pod-template-hash=7bb5df8fc5
              tuned.openshift.io/elasticsearch=true
Annotations:  k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.128.2.54"
                    ],
                    "dns": {},
                    "default-route": [
                        "10.128.2.1"
                    ]
                }]
              openshift.io/scc: restricted
Status:       Running
IP:           10.128.2.54
IPs:
  IP:           10.128.2.54
Controlled By:  ReplicaSet/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5
Containers:
  elasticsearch:
    Container ID:   cri-o://4a34c41016bf9de9d786f3228cc072ff0b9ef79f31b5eda53b8bd9a85149fcc8
    Image:          registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f
    Image ID:       registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f
    Ports:          9300/TCP, 9200/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    78
      Started:      Mon, 01 Jun 2020 09:20:31 -0400
      Finished:     Mon, 01 Jun 2020 09:20:58 -0400
    Ready:          False
    Restart Count:  824
    Limits:
      memory:  4Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3
    Environment:
      DC_NAME:                  elasticsearch-cdm-istiosystemjaeger-1
      NAMESPACE:                istio-system (v1:metadata.namespace)
      KUBERNETES_TRUST_CERT:    true
      SERVICE_DNS:              elasticsearch-cluster
      CLUSTER_NAME:             elasticsearch
      INSTANCE_RAM:             4Gi
      HEAP_DUMP_LOCATION:       /elasticsearch/persistent/heapdump.hprof
      RECOVER_AFTER_TIME:       5m
      READINESS_PROBE_TIMEOUT:  30
      POD_LABEL:                cluster=elasticsearch
      IS_MASTER:                true
      HAS_DATA:                 true
    Mounts:
      /elasticsearch/persistent from elasticsearch-storage (rw)
      /etc/openshift/elasticsearch/secret from certificates (rw)
      /usr/share/java/elasticsearch/config from elasticsearch-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-d9qxs (ro)
  proxy:
    Container ID:  cri-o://47b5c22c2db2adbbeb9dd6e61b726723f808570d9523fc4cf8ccf322222f8743
    Image:         registry.redhat.io/openshift4/ose-oauth-proxy@sha256:46e796b768c848bb24d19ca028cd87c73a0b330601758b9d9f25869b94586725
    Image ID:      registry.redhat.io/openshift4/ose-oauth-proxy@sha256:46e796b768c848bb24d19ca028cd87c73a0b330601758b9d9f25869b94586725
    Port:          60000/TCP
    Host Port:     0/TCP
    Args:
      --https-address=:60000
      --provider=openshift
      --upstream=https://127.0.0.1:9200
      --tls-cert=/etc/proxy/secrets/tls.crt
      --tls-key=/etc/proxy/secrets/tls.key
      --upstream-ca=/etc/proxy/elasticsearch/admin-ca
      --openshift-service-account=elasticsearch
      -openshift-sar={"resource": "namespaces", "verb": "get"}
      -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
      --pass-user-bearer-token
      --cookie-secret=dRH6aHl98+nypiW4QObtGA==
    State:          Running
      Started:      Fri, 29 May 2020 05:16:24 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  64Mi
    Requests:
      cpu:        100m
      memory:     64Mi
    Environment:  <none>
    Mounts:
      /etc/proxy/elasticsearch from certificates (rw)
      /etc/proxy/secrets from elasticsearch-metrics (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-d9qxs (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  elasticsearch-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      elasticsearch
    Optional:  false
  elasticsearch-storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  elasticsearch
    Optional:    false
  elasticsearch-metrics:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  elasticsearch-metrics
    Optional:    false
  elasticsearch-token-d9qxs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  elasticsearch-token-d9qxs
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                                          Message
  ----     ------     ----                    ----                                          -------
  Warning  Unhealthy  96m (x2835 over 3d4h)   kubelet, worker-1.nour-kvm-poc.zkvmocp.notld  Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]
  Normal   Pulled     41m (x818 over 3d4h)    kubelet, worker-1.nour-kvm-poc.zkvmocp.notld  Container image "registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f" already present on machine
  Warning  BackOff    77s (x19296 over 3d4h)  kubelet, worker-1.nour-kvm-poc.zkvmocp.notld  Back-off restarting failed container
[root@localhost ~]#

Comment 7 Jeff Cantrill 2020-06-01 17:40:09 UTC
Moving to multiarch team because of failure with alternate architecture

Comment 9 Yaakov Selkowitz 2020-06-04 17:48:25 UTC
(In reply to Lukas Vlcek from comment #2)
> I think you are running into this:
> https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-filter-check.html

Can't tell if that is the issue here or not, but if so, see bug 1807201.

Comment 10 Rashmi Sakhalkar 2020-06-08 06:52:45 UTC
(In reply to Lukas Vlcek from comment #2)
> I think you are running into this:
> https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-
> filter-check.html
> 
> You probably need to check ES node logs to see more details.
> 
> oc logs <pod> -c elasticsearch

The below error is seen the elasticsearch pod logs:
Error in logs:-

ERROR: [1] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

Comment 11 Lukas Vlcek 2020-06-08 09:32:44 UTC
IMO this is duplicate of bug 1807201

System call filter check fails hence the ES node does not start.
Either consider disabling this particular bootstrap check (see https://bugzilla.redhat.com/show_bug.cgi?id=1807201#c16)
or you need to provide code patch (again see https://bugzilla.redhat.com/show_bug.cgi?id=1807201 for some discussion about this).

Comment 12 Rashmi Sakhalkar 2020-06-10 14:08:53 UTC
(In reply to Lukas Vlcek from comment #11)
> IMO this is duplicate of bug 1807201
> 
> System call filter check fails hence the ES node does not start.
> Either consider disabling this particular bootstrap check (see
> https://bugzilla.redhat.com/show_bug.cgi?id=1807201#c16)
> or you need to provide code patch (again see
> https://bugzilla.redhat.com/show_bug.cgi?id=1807201 for some discussion
> about this).

@Lukas I assume disabling the filter check would be needed as part of the elasticsearch operator build?

Comment 13 Lukas Vlcek 2020-06-11 09:05:15 UTC
@Rashmi, most likely. It needs to be set before ES node starts (because it needs to be stated in elasticsearch.yml file) and it should be set on all cluster nodes the same way.

Comment 14 Yaakov Selkowitz 2020-06-12 15:57:13 UTC
(In reply to Lukas Vlcek from comment #11)
> IMO this is duplicate of bug 1807201

I concur.

*** This bug has been marked as a duplicate of bug 1807201 ***


Note You need to log in before you can comment on or make changes to this bug.