Created attachment 1694043 [details] Failure_logs Description of problem: ServiceMesh controlpane creation is not successful due to failure in elasticsearch pod. Version-Release number of selected component (if applicable): How reproducible: Create SMCP using below yaml: apiVersion: maistra.io/v1 kind: ServiceMeshControlPlane metadata: name: basic-install namespace: istio-system spec: istio: gateways: istio-egressgateway: autoscaleEnabled: false istio-ingressgateway: autoscaleEnabled: false mixer: policy: autoscaleEnabled: false telemetry: autoscaleEnabled: false pilot: autoscaleEnabled: false traceSampling: 100 kiali: enabled: true grafana: enabled: true tracing: enabled: true jaeger: template: production-elasticsearch elasticsearch: nodeCount: 1 resources: requests: cpu: "1" memory: "1Gi" limits: memory: "4Gi" Steps to Reproduce: 1. Create ElasticSearch, kiali, Jeager, ServiceMesh operators. 2. Create istio-system project 3. Create smcp using the above yaml. Detailed logs of execution attached. Actual results: SMCP remain in Install successfull state. Elasticsearch pod failures are seen in istio-system project Expected results: SMCP should be created successfully. Additional info:
[root@localhost ocp]# ./oc version Client Version: 4.3.19 Server Version: 4.3.19 Kubernetes Version: v1.16.2 [root@localhost ocp]# The elasticsearch Pod failure issue is seen on System Z(s390x) and on Power(ppc64le).
I think you are running into this: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-filter-check.html You probably need to check ES node logs to see more details. oc logs <pod> -c elasticsearch
Error in logs:- ERROR: [1] bootstrap checks failed [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk Output:- [root@localhost ocp]# ./oc logs elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -c elasticsearch -n istio-system [2020-06-01 10:33:49,589][INFO ][container.run ] Begin Elasticsearch startup script [2020-06-01 10:33:49,592][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch... [2020-06-01 10:33:49,595][INFO ][container.run ] Inspecting the maximum RAM available... [2020-06-01 10:33:49,602][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m' [2020-06-01 10:33:49,603][INFO ][container.run ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret [2020-06-01 10:33:49,607][INFO ][container.run ] Building required jks files and truststore Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Certificate was added to keystore [2020-06-01 10:33:52,915][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof [2020-06-01 10:33:52,915][INFO ][container.run ] Checking if Elasticsearch is ready [2020-06-01 10:33:52,916][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled' ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. ERROR: [1] bootstrap checks failed [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk [root@localhost ocp]#
Sorry, I was not clear. We need to check what is in Elasticsearch log. You can find location of ES logs following our dump script: https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh#L291 You can see there are a few locations you need to check for logs. Or you just run this dump script and upload the archive file so that we can check. We need to see if the log contains more details, for example this error has been reported in connection to kernel version. See: https://discuss.elastic.co/t/how-to-get-logs-for-system-call-filters-failed-to-install/185596 https://discuss.elastic.co/t/elasticsearch-is-throwing-bootstrap-error-and-its-unable-to-load-system-call-filters-when-im-trying-to-run-on-linux/88517
I tried to execute the script but as the container is not running its failing. Even a simple date command is not executed. [root@localhost ocp]# ./oc exec elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -c elasticsearch date error: Internal error occurred: error executing command in container: container is not created or running [root@localhost ocp]# [root@localhost ~]# ./oc get pods NAME READY STATUS RESTARTS AGE elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb 1/2 CrashLoopBackOff 822 3d3h grafana-7d6b5c4ccf-r9ld8 2/2 Running 0 3d3h istio-citadel-75674db4d5-kk2sl 1/1 Running 0 3d3h istio-egressgateway-656997ff47-gbrwg 1/1 Running 0 3d3h istio-galley-595f769dc5-2v7m7 1/1 Running 0 3d3h istio-ingressgateway-748df5748c-8d4bv 1/1 Running 0 3d3h istio-pilot-85c4846569-m94b4 2/2 Running 0 3d3h istio-policy-9c7cf98f8-4nkkm 2/2 Running 0 3d3h istio-sidecar-injector-74589cfb79-b2n44 1/1 Running 0 3d3h istio-telemetry-df4d745d5-bgkjw 2/2 Running 0 3d3h jaeger-collector-6c846c488c-nqz5f 0/1 CrashLoopBackOff 903 3d3h jaeger-es-index-cleaner-1590969300-6p9wl 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-jmmjh 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-q5fw7 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-rz2zk 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-xzvxz 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-z5rtb 0/1 Error 0 13h jaeger-query-5b75fbf477-bgw4n 2/3 CrashLoopBackOff 905 3d3h kiali-579f9d9fdc-lsb58 1/1 Running 0 3d3h prometheus-67bfdddf9-qhqm4 2/2 Running 0 3d3h [root@localhost ~]# Script output: ---- Unable to get ES logs from pod elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb
[root@localhost ~]# ./oc get pods --selector component=elasticsearch -o name pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb [root@localhost ~]# ./oc exec elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -- indices Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -n istio-system' to see all of the containers in this pod. error: unable to upgrade connection: container not found ("elasticsearch") [root@localhost ~]# ./oc describe pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb Name: elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb Namespace: istio-system Priority: 0 Node: worker-1.nour-kvm-poc.zkvmocp.notld/192.168.79.25 Start Time: Fri, 29 May 2020 05:16:21 -0400 Labels: cluster-name=elasticsearch component=elasticsearch es-node-client=true es-node-data=true es-node-master=true node-name=elasticsearch-cdm-istiosystemjaeger-1 pod-template-hash=7bb5df8fc5 tuned.openshift.io/elasticsearch=true Annotations: k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.54" ], "dns": {}, "default-route": [ "10.128.2.1" ] }] openshift.io/scc: restricted Status: Running IP: 10.128.2.54 IPs: IP: 10.128.2.54 Controlled By: ReplicaSet/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5 Containers: elasticsearch: Container ID: cri-o://4a34c41016bf9de9d786f3228cc072ff0b9ef79f31b5eda53b8bd9a85149fcc8 Image: registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f Image ID: registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f Ports: 9300/TCP, 9200/TCP Host Ports: 0/TCP, 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 78 Started: Mon, 01 Jun 2020 09:20:31 -0400 Finished: Mon, 01 Jun 2020 09:20:58 -0400 Ready: False Restart Count: 824 Limits: memory: 4Gi Requests: cpu: 1 memory: 1Gi Readiness: exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3 Environment: DC_NAME: elasticsearch-cdm-istiosystemjaeger-1 NAMESPACE: istio-system (v1:metadata.namespace) KUBERNETES_TRUST_CERT: true SERVICE_DNS: elasticsearch-cluster CLUSTER_NAME: elasticsearch INSTANCE_RAM: 4Gi HEAP_DUMP_LOCATION: /elasticsearch/persistent/heapdump.hprof RECOVER_AFTER_TIME: 5m READINESS_PROBE_TIMEOUT: 30 POD_LABEL: cluster=elasticsearch IS_MASTER: true HAS_DATA: true Mounts: /elasticsearch/persistent from elasticsearch-storage (rw) /etc/openshift/elasticsearch/secret from certificates (rw) /usr/share/java/elasticsearch/config from elasticsearch-config (rw) /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-d9qxs (ro) proxy: Container ID: cri-o://47b5c22c2db2adbbeb9dd6e61b726723f808570d9523fc4cf8ccf322222f8743 Image: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:46e796b768c848bb24d19ca028cd87c73a0b330601758b9d9f25869b94586725 Image ID: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:46e796b768c848bb24d19ca028cd87c73a0b330601758b9d9f25869b94586725 Port: 60000/TCP Host Port: 0/TCP Args: --https-address=:60000 --provider=openshift --upstream=https://127.0.0.1:9200 --tls-cert=/etc/proxy/secrets/tls.crt --tls-key=/etc/proxy/secrets/tls.key --upstream-ca=/etc/proxy/elasticsearch/admin-ca --openshift-service-account=elasticsearch -openshift-sar={"resource": "namespaces", "verb": "get"} -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}} --pass-user-bearer-token --cookie-secret=dRH6aHl98+nypiW4QObtGA== State: Running Started: Fri, 29 May 2020 05:16:24 -0400 Ready: True Restart Count: 0 Limits: memory: 64Mi Requests: cpu: 100m memory: 64Mi Environment: <none> Mounts: /etc/proxy/elasticsearch from certificates (rw) /etc/proxy/secrets from elasticsearch-metrics (rw) /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-d9qxs (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: elasticsearch-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: elasticsearch Optional: false elasticsearch-storage: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> certificates: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch Optional: false elasticsearch-metrics: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch-metrics Optional: false elasticsearch-token-d9qxs: Type: Secret (a volume populated by a Secret) SecretName: elasticsearch-token-d9qxs Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 96m (x2835 over 3d4h) kubelet, worker-1.nour-kvm-poc.zkvmocp.notld Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000] Normal Pulled 41m (x818 over 3d4h) kubelet, worker-1.nour-kvm-poc.zkvmocp.notld Container image "registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f" already present on machine Warning BackOff 77s (x19296 over 3d4h) kubelet, worker-1.nour-kvm-poc.zkvmocp.notld Back-off restarting failed container [root@localhost ~]#
Moving to multiarch team because of failure with alternate architecture
(In reply to Lukas Vlcek from comment #2) > I think you are running into this: > https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-filter-check.html Can't tell if that is the issue here or not, but if so, see bug 1807201.
(In reply to Lukas Vlcek from comment #2) > I think you are running into this: > https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call- > filter-check.html > > You probably need to check ES node logs to see more details. > > oc logs <pod> -c elasticsearch The below error is seen the elasticsearch pod logs: Error in logs:- ERROR: [1] bootstrap checks failed [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
IMO this is duplicate of bug 1807201 System call filter check fails hence the ES node does not start. Either consider disabling this particular bootstrap check (see https://bugzilla.redhat.com/show_bug.cgi?id=1807201#c16) or you need to provide code patch (again see https://bugzilla.redhat.com/show_bug.cgi?id=1807201 for some discussion about this).
(In reply to Lukas Vlcek from comment #11) > IMO this is duplicate of bug 1807201 > > System call filter check fails hence the ES node does not start. > Either consider disabling this particular bootstrap check (see > https://bugzilla.redhat.com/show_bug.cgi?id=1807201#c16) > or you need to provide code patch (again see > https://bugzilla.redhat.com/show_bug.cgi?id=1807201 for some discussion > about this). @Lukas I assume disabling the filter check would be needed as part of the elasticsearch operator build?
@Rashmi, most likely. It needs to be set before ES node starts (because it needs to be stated in elasticsearch.yml file) and it should be set on all cluster nodes the same way.
(In reply to Lukas Vlcek from comment #11) > IMO this is duplicate of bug 1807201 I concur. *** This bug has been marked as a duplicate of bug 1807201 ***