Bug 1842412
| Summary: | ServiceMesh custom installation fails on System Z(s390x ) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rashmi Sakhalkar <rsakhalk> | ||||
| Component: | Multi-Arch | Assignee: | Dennis Gilmore <dgilmore> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Barry Donahue <bdonahue> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.3.0 | CC: | aos-bugs, dslavens, jcantril, jkandasa, lvlcek, rsakhalk, yselkowi | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | s390 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | multi-arch | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-06-12 15:57:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
[root@localhost ocp]# ./oc version Client Version: 4.3.19 Server Version: 4.3.19 Kubernetes Version: v1.16.2 [root@localhost ocp]# The elasticsearch Pod failure issue is seen on System Z(s390x) and on Power(ppc64le). I think you are running into this: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-filter-check.html You probably need to check ES node logs to see more details. oc logs <pod> -c elasticsearch Error in logs:- ERROR: [1] bootstrap checks failed [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk Output:- [root@localhost ocp]# ./oc logs elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -c elasticsearch -n istio-system [2020-06-01 10:33:49,589][INFO ][container.run ] Begin Elasticsearch startup script [2020-06-01 10:33:49,592][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch... [2020-06-01 10:33:49,595][INFO ][container.run ] Inspecting the maximum RAM available... [2020-06-01 10:33:49,602][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m' [2020-06-01 10:33:49,603][INFO ][container.run ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret [2020-06-01 10:33:49,607][INFO ][container.run ] Building required jks files and truststore Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Certificate was added to keystore [2020-06-01 10:33:52,915][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof [2020-06-01 10:33:52,915][INFO ][container.run ] Checking if Elasticsearch is ready [2020-06-01 10:33:52,916][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled' ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. ERROR: [1] bootstrap checks failed [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk [root@localhost ocp]# Sorry, I was not clear. We need to check what is in Elasticsearch log. You can find location of ES logs following our dump script: https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh#L291 You can see there are a few locations you need to check for logs. Or you just run this dump script and upload the archive file so that we can check. We need to see if the log contains more details, for example this error has been reported in connection to kernel version. See: https://discuss.elastic.co/t/how-to-get-logs-for-system-call-filters-failed-to-install/185596 https://discuss.elastic.co/t/elasticsearch-is-throwing-bootstrap-error-and-its-unable-to-load-system-call-filters-when-im-trying-to-run-on-linux/88517 I tried to execute the script but as the container is not running its failing. Even a simple date command is not executed. [root@localhost ocp]# ./oc exec elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -c elasticsearch date error: Internal error occurred: error executing command in container: container is not created or running [root@localhost ocp]# [root@localhost ~]# ./oc get pods NAME READY STATUS RESTARTS AGE elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb 1/2 CrashLoopBackOff 822 3d3h grafana-7d6b5c4ccf-r9ld8 2/2 Running 0 3d3h istio-citadel-75674db4d5-kk2sl 1/1 Running 0 3d3h istio-egressgateway-656997ff47-gbrwg 1/1 Running 0 3d3h istio-galley-595f769dc5-2v7m7 1/1 Running 0 3d3h istio-ingressgateway-748df5748c-8d4bv 1/1 Running 0 3d3h istio-pilot-85c4846569-m94b4 2/2 Running 0 3d3h istio-policy-9c7cf98f8-4nkkm 2/2 Running 0 3d3h istio-sidecar-injector-74589cfb79-b2n44 1/1 Running 0 3d3h istio-telemetry-df4d745d5-bgkjw 2/2 Running 0 3d3h jaeger-collector-6c846c488c-nqz5f 0/1 CrashLoopBackOff 903 3d3h jaeger-es-index-cleaner-1590969300-6p9wl 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-jmmjh 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-q5fw7 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-rz2zk 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-xzvxz 0/1 Error 0 13h jaeger-es-index-cleaner-1590969300-z5rtb 0/1 Error 0 13h jaeger-query-5b75fbf477-bgw4n 2/3 CrashLoopBackOff 905 3d3h kiali-579f9d9fdc-lsb58 1/1 Running 0 3d3h prometheus-67bfdddf9-qhqm4 2/2 Running 0 3d3h [root@localhost ~]# Script output: ---- Unable to get ES logs from pod elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb [root@localhost ~]# ./oc get pods --selector component=elasticsearch -o name
pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb
[root@localhost ~]# ./oc exec elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -- indices
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb -n istio-system' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("elasticsearch")
[root@localhost ~]# ./oc describe pod/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb
Name: elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5-p9dxb
Namespace: istio-system
Priority: 0
Node: worker-1.nour-kvm-poc.zkvmocp.notld/192.168.79.25
Start Time: Fri, 29 May 2020 05:16:21 -0400
Labels: cluster-name=elasticsearch
component=elasticsearch
es-node-client=true
es-node-data=true
es-node-master=true
node-name=elasticsearch-cdm-istiosystemjaeger-1
pod-template-hash=7bb5df8fc5
tuned.openshift.io/elasticsearch=true
Annotations: k8s.v1.cni.cncf.io/networks-status:
[{
"name": "openshift-sdn",
"interface": "eth0",
"ips": [
"10.128.2.54"
],
"dns": {},
"default-route": [
"10.128.2.1"
]
}]
openshift.io/scc: restricted
Status: Running
IP: 10.128.2.54
IPs:
IP: 10.128.2.54
Controlled By: ReplicaSet/elasticsearch-cdm-istiosystemjaeger-1-7bb5df8fc5
Containers:
elasticsearch:
Container ID: cri-o://4a34c41016bf9de9d786f3228cc072ff0b9ef79f31b5eda53b8bd9a85149fcc8
Image: registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f
Image ID: registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f
Ports: 9300/TCP, 9200/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 78
Started: Mon, 01 Jun 2020 09:20:31 -0400
Finished: Mon, 01 Jun 2020 09:20:58 -0400
Ready: False
Restart Count: 824
Limits:
memory: 4Gi
Requests:
cpu: 1
memory: 1Gi
Readiness: exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3
Environment:
DC_NAME: elasticsearch-cdm-istiosystemjaeger-1
NAMESPACE: istio-system (v1:metadata.namespace)
KUBERNETES_TRUST_CERT: true
SERVICE_DNS: elasticsearch-cluster
CLUSTER_NAME: elasticsearch
INSTANCE_RAM: 4Gi
HEAP_DUMP_LOCATION: /elasticsearch/persistent/heapdump.hprof
RECOVER_AFTER_TIME: 5m
READINESS_PROBE_TIMEOUT: 30
POD_LABEL: cluster=elasticsearch
IS_MASTER: true
HAS_DATA: true
Mounts:
/elasticsearch/persistent from elasticsearch-storage (rw)
/etc/openshift/elasticsearch/secret from certificates (rw)
/usr/share/java/elasticsearch/config from elasticsearch-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-d9qxs (ro)
proxy:
Container ID: cri-o://47b5c22c2db2adbbeb9dd6e61b726723f808570d9523fc4cf8ccf322222f8743
Image: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:46e796b768c848bb24d19ca028cd87c73a0b330601758b9d9f25869b94586725
Image ID: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:46e796b768c848bb24d19ca028cd87c73a0b330601758b9d9f25869b94586725
Port: 60000/TCP
Host Port: 0/TCP
Args:
--https-address=:60000
--provider=openshift
--upstream=https://127.0.0.1:9200
--tls-cert=/etc/proxy/secrets/tls.crt
--tls-key=/etc/proxy/secrets/tls.key
--upstream-ca=/etc/proxy/elasticsearch/admin-ca
--openshift-service-account=elasticsearch
-openshift-sar={"resource": "namespaces", "verb": "get"}
-openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
--pass-user-bearer-token
--cookie-secret=dRH6aHl98+nypiW4QObtGA==
State: Running
Started: Fri, 29 May 2020 05:16:24 -0400
Ready: True
Restart Count: 0
Limits:
memory: 64Mi
Requests:
cpu: 100m
memory: 64Mi
Environment: <none>
Mounts:
/etc/proxy/elasticsearch from certificates (rw)
/etc/proxy/secrets from elasticsearch-metrics (rw)
/var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-d9qxs (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: elasticsearch
Optional: false
elasticsearch-storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
certificates:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch
Optional: false
elasticsearch-metrics:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch-metrics
Optional: false
elasticsearch-token-d9qxs:
Type: Secret (a volume populated by a Secret)
SecretName: elasticsearch-token-d9qxs
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 96m (x2835 over 3d4h) kubelet, worker-1.nour-kvm-poc.zkvmocp.notld Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]
Normal Pulled 41m (x818 over 3d4h) kubelet, worker-1.nour-kvm-poc.zkvmocp.notld Container image "registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:9ea7dc01c74e93d48cf5f275589572e8daa732c73e187e04c7b60535e42d630f" already present on machine
Warning BackOff 77s (x19296 over 3d4h) kubelet, worker-1.nour-kvm-poc.zkvmocp.notld Back-off restarting failed container
[root@localhost ~]#
Moving to multiarch team because of failure with alternate architecture (In reply to Lukas Vlcek from comment #2) > I think you are running into this: > https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call-filter-check.html Can't tell if that is the issue here or not, but if so, see bug 1807201. (In reply to Lukas Vlcek from comment #2) > I think you are running into this: > https://www.elastic.co/guide/en/elasticsearch/reference/5.6/system-call- > filter-check.html > > You probably need to check ES node logs to see more details. > > oc logs <pod> -c elasticsearch The below error is seen the elasticsearch pod logs: Error in logs:- ERROR: [1] bootstrap checks failed [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk IMO this is duplicate of bug 1807201 System call filter check fails hence the ES node does not start. Either consider disabling this particular bootstrap check (see https://bugzilla.redhat.com/show_bug.cgi?id=1807201#c16) or you need to provide code patch (again see https://bugzilla.redhat.com/show_bug.cgi?id=1807201 for some discussion about this). (In reply to Lukas Vlcek from comment #11) > IMO this is duplicate of bug 1807201 > > System call filter check fails hence the ES node does not start. > Either consider disabling this particular bootstrap check (see > https://bugzilla.redhat.com/show_bug.cgi?id=1807201#c16) > or you need to provide code patch (again see > https://bugzilla.redhat.com/show_bug.cgi?id=1807201 for some discussion > about this). @Lukas I assume disabling the filter check would be needed as part of the elasticsearch operator build? @Rashmi, most likely. It needs to be set before ES node starts (because it needs to be stated in elasticsearch.yml file) and it should be set on all cluster nodes the same way. (In reply to Lukas Vlcek from comment #11) > IMO this is duplicate of bug 1807201 I concur. *** This bug has been marked as a duplicate of bug 1807201 *** |
Created attachment 1694043 [details] Failure_logs Description of problem: ServiceMesh controlpane creation is not successful due to failure in elasticsearch pod. Version-Release number of selected component (if applicable): How reproducible: Create SMCP using below yaml: apiVersion: maistra.io/v1 kind: ServiceMeshControlPlane metadata: name: basic-install namespace: istio-system spec: istio: gateways: istio-egressgateway: autoscaleEnabled: false istio-ingressgateway: autoscaleEnabled: false mixer: policy: autoscaleEnabled: false telemetry: autoscaleEnabled: false pilot: autoscaleEnabled: false traceSampling: 100 kiali: enabled: true grafana: enabled: true tracing: enabled: true jaeger: template: production-elasticsearch elasticsearch: nodeCount: 1 resources: requests: cpu: "1" memory: "1Gi" limits: memory: "4Gi" Steps to Reproduce: 1. Create ElasticSearch, kiali, Jeager, ServiceMesh operators. 2. Create istio-system project 3. Create smcp using the above yaml. Detailed logs of execution attached. Actual results: SMCP remain in Install successfull state. Elasticsearch pod failures are seen in istio-system project Expected results: SMCP should be created successfully. Additional info: