Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1899419

Summary: The ES proxy container always restarts when FIPS is enabled in the cluster.
Product: OpenShift Container Platform Reporter: Periklis Tsirakidis <periklis>
Component: LoggingAssignee: Periklis Tsirakidis <periklis>
Status: CLOSED ERRATA QA Contact: Qiaoling Tang <qitang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: anli, aos-bugs, qitang, sreber, yhe
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-20 05:33:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1880960    
Bug Blocks:    

Description Periklis Tsirakidis 2020-11-19 07:47:54 UTC
This bug was initially created as a copy of Bug #1880960

I am copying this bug because: 



Description of problem:

The proxy container restarts frequently:
$ oc get pod
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-69c9d6f5bc-njb8q       1/1     Running     0          135m
elasticsearch-cdm-soj6vq70-1-688ddf8c9f-87vt9   2/2     Running     12         135m
elasticsearch-cdm-soj6vq70-2-64876b6bd-sld68    2/2     Running     12         140m
elasticsearch-cdm-soj6vq70-3-5d54fb5d65-vbkww   2/2     Running     12         146m

$ oc describe pod elasticsearch-cdm-soj6vq70-3-5d54fb5d65-vbkww 
  proxy:
    Container ID:  cri-o://9da3f8434af0659f409ba6690dfa1d6b3753cf10c8499e6a6b462381380a22b5
    Image:         registry.redhat.io/openshift4/ose-elasticsearch-proxy@sha256:ae8c365911b5f2e24b9fd07129ce18594834997ec7c73bd2652dfd695e75746f
    Image ID:      registry.redhat.io/openshift4/ose-elasticsearch-proxy@sha256:1d0e41e6c7ffe968149d6b9b90b56081af15a3370fe3de607a84b3c20677dbc9
    Ports:         60000/TCP, 60001/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      --listening-address=:60000
      --tls-cert=/etc/proxy/elasticsearch/logging-es.crt
      --tls-key=/etc/proxy/elasticsearch/logging-es.key
      --tls-client-ca=/etc/proxy/elasticsearch/admin-ca
      --metrics-listening-address=:60001
      --metrics-tls-cert=/etc/proxy/secrets/tls.crt
      --metrics-tls-key=/etc/proxy/secrets/tls.key
      --upstream-ca=/etc/proxy/elasticsearch/admin-ca
      --cache-expiry=60s
      --auth-backend-role=admin_reader={"namespace": "default", "verb": "get", "resource": "pods/log"}
      --auth-backend-role=prometheus={"verb": "get", "resource": "/metrics"}
      --auth-backend-role=jaeger={"verb": "get", "resource": "/jaeger", "resourceAPIGroup": "elasticsearch.jaegertracing.io"}
      --auth-backend-role=elasticsearch-operator={"namespace": "*", "verb": "*", "resource": "*", "resourceAPIGroup": "logging.openshift.io"}
      --auth-backend-role=index-management={"namespace":"openshift-logging", "verb": "*", "resource": "indices", "resourceAPIGroup": "elasticsearch.openshift.io"}
      --auth-admin-role=admin_reader
      --auth-default-role=project_user
    State:          Running
      Started:      Mon, 21 Sep 2020 05:26:57 -0400
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 21 Sep 2020 05:15:06 -0400
      Finished:     Mon, 21 Sep 2020 05:26:56 -0400
    Ready:          True
    Restart Count:  12
    Limits:
      memory:  64Mi
    Requests:
      cpu:     100m
      memory:  64Mi
    Environment:
      LOG_LEVEL:  info
    Mounts:
      /etc/proxy/elasticsearch from certificates (rw)
      /etc/proxy/secrets from elasticsearch-metrics (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from elasticsearch-token-99brw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  elasticsearch-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      elasticsearch
    Optional:  false
  elasticsearch-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  elasticsearch-elasticsearch-cdm-soj6vq70-3
    ReadOnly:   false
  certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  elasticsearch
    Optional:    false
  elasticsearch-metrics:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  elasticsearch-metrics
    Optional:    false
  elasticsearch-token-99brw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  elasticsearch-token-99brw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From                                                           Message
  ----     ------                  ----                  ----                                                           -------
  Warning  FailedScheduling        144m                                                                                 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
  Warning  FailedScheduling        144m                                                                                 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
  Normal   Scheduled               140m                                                                                 Successfully assigned openshift-logging/elasticsearch-cdm-soj6vq70-3-5d54fb5d65-vbkww to qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal
  Normal   SuccessfulAttachVolume  140m                  attachdetach-controller                                        AttachVolume.Attach succeeded for volume "pvc-73b52ca6-bf96-4a3a-a12c-15275b34267c"
  Normal   AddedInterface          140m                  multus                                                         Add eth0 [10.129.2.5/23]
  Normal   Pulled                  140m                  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Container image "registry.redhat.io/openshift4/ose-logging-elasticsearch6@sha256:d083829ae9a4777f4f070acdd64298e1514e8b7895019186af22f8893656e475" already present on machine
  Normal   Created                 140m                  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Created container elasticsearch
  Normal   Started                 140m                  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Started container elasticsearch
  Warning  Unhealthy               140m (x3 over 140m)   kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 000]
  Warning  Unhealthy               139m                  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Readiness probe failed: Elasticsearch node is not ready to accept HTTP requests yet [response code: 503]
  Warning  BackOff                 24m (x4 over 116m)    kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Back-off restarting failed container
  Normal   Pulled                  115s (x13 over 140m)  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Container image "registry.redhat.io/openshift4/ose-elasticsearch-proxy@sha256:ae8c365911b5f2e24b9fd07129ce18594834997ec7c73bd2652dfd695e75746f" already present on machine
  Normal   Created                 114s (x13 over 140m)  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Created container proxy
  Normal   Started                 114s (x13 over 140m)  kubelet, qitang2-d7vhm-worker-a-kgwwq.c.openshift-qe.internal  Started container proxy 


No error message in the proxy container:
$ oc logs -c proxy elasticsearch-cdm-soj6vq70-3-5d54fb5d65-vbkww 
time="2020-09-21T07:32:21Z" level=info msg="mapping path \"/\" => upstream \"https://localhost:9200/\""
time="2020-09-21T07:32:21Z" level=info msg="HTTPS: listening on [::]:60001"
time="2020-09-21T07:32:21Z" level=info msg="HTTPS: listening on [::]:60000"

The worker node has enough resources:
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests       Limits
  --------                   --------       ------
  cpu                        1936m (55%)    1 (28%)
  memory                     10123Mi (72%)  8480Mi (61%)
  ephemeral-storage          0 (0%)         0 (0%)
  hugepages-1Gi              0 (0%)         0 (0%)
  hugepages-2Mi              0 (0%)         0 (0%)
  attachable-volumes-gce-pd  0              0

log store configurations:
    logStore:
      elasticsearch:
        nodeCount: 3
        redundancyPolicy: SingleRedundancy
        resources:
          requests:
            memory: 2Gi
        storage:
          size: 20Gi
          storageClassName: standard
      retentionPolicy:
        application:
          maxAge: 1d
        audit:
          maxAge: 2w
        infra:
          maxAge: 12h
      type: elasticsearch

Version-Release number of selected component (if applicable):
elasticsearch-operator.4.6.0-202009192030.p0

How reproducible:
In some clusters, it's 100% reproducible, in some clusters, no such issue.

Steps to Reproduce:
1. deploy logging 4.6
2. check ES pods 
3.

Actual results:


Expected results:


Additional info:

Comment 1 Periklis Tsirakidis 2020-11-26 13:44:22 UTC
*** Bug 1897419 has been marked as a duplicate of this bug. ***

Comment 3 Qiaoling Tang 2020-12-24 05:12:18 UTC
Tested with elasticsearch-operator.4.5.0-202012230113.p0 and clusterlogging.4.5.0-202012231441.p0, unable to reproduce this issue, and the default request memory of proxy container is updated to 256Mi, so move to verified.

Comment 6 errata-xmlrpc 2021-01-20 05:33:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.27 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0035