Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1873493

Summary: ElasticsearchSecurityException-after upgrade of OCP 4.5.4 to 4.5.6
Product: OpenShift Container Platform Reporter: mchebbi <mchebbi>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: high    
Version: 4.5CC: aos-bugs, brejones, ewolinet, mburke
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 15:10:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1913952    

Description mchebbi@redhat.com 2020-08-28 13:59:19 UTC
relevant information here:
https://bit.ly/3b46EPW

The customer did an upgrade of his OCP cluster from 4.5.4 to 4.5.6. Then he looses access to Kibana and it shows a Page with "Kibana server is not ready yet"

I have checked the logs of ES and i found those exceptions:

[2020-08-26T12:15:10,900][WARN ][r.suppressed             ] [elasticsearch-cdm-6o5bak27-1] path: /infra-write/_rollover, params: {pretty=, index=infra-write}
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-6o5bak27-3][10.252.2.10:9300][indices:admin/rollover]
Caused by: org.elasticsearch.ElasticsearchSecurityException: Unexpected exception indices:admin/rollover
        at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:274) ~[?:?]
        at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) ~[?:?]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) ~[?:?]

----------------------------


[2020-08-24T12:50:52,991][ERROR][c.a.o.s.a.BackendRegistry] [elasticsearch-cdm-6o5bak27-2] Not yet initialized 
[2020-08-24T12:51:02,801][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-6o5bak27-2] .security index exist, so we try to load the config from it
[2020-08-24T12:51:02,813][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [elasticsearch-cdm-6o5bak27-2] failed to execute on node [1OUdHa5TQyWp7Sp3FCagrw]
org.elasticsearch.transport.NodeNotConnectedException: [elasticsearch-cdm-6o5bak27-1][10.253.2.71:9300] Node not connected
        at org.elasticsearch.transport.ConnectionManager.getConnection(ConnectionManager.java:151) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.transport.TransportService.getConnection(TransportService.java:557) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:529) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:194) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:91) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:54) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
~~~
$ oc get pods
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-546cf987c4-n7rjn       1/1     Running     0          23h
curator-1598326200-4q876                        0/1     Completed   0          8h
elasticsearch-cdm-6o5bak27-1-5d7bf795df-7m47r   2/2     Running     0          22h
elasticsearch-cdm-6o5bak27-2-644c565bcb-znbpj   2/2     Running     0          23h
elasticsearch-cdm-6o5bak27-3-b968fdf59-fkpcf    2/2     Running     0          23h
elasticsearch-delete-app-1598355900-rpnqb       0/1     Completed   0          26s
elasticsearch-delete-audit-1598355900-vh5k4     0/1     Completed   0          26s
elasticsearch-delete-infra-1598355900-rhr59     0/1     Completed   0          26s
elasticsearch-rollover-app-1598355900-fxtb7     0/1     Completed   0          26s
elasticsearch-rollover-audit-1598355900-swwr4   0/1     Completed   0          26s
elasticsearch-rollover-infra-1598355900-gnszv   0/1     Completed   0          26s
fluentd-5zhth                                   1/1     Running     0          23h
fluentd-qzsj2                                   1/1     Running     0          23h
fluentd-r2hvz                                   1/1     Running     0          23h
fluentd-tf8f2                                   1/1     Running     0          23h
fluentd-w5rcp                                   1/1     Running     0          23h
fluentd-w6kl7                                   1/1     Running     0          23h
kibana-59f4bbcc4d-szznb                         2/2     Running     0          19m
~~~

Kibana Logs say "Waiting for Elasticsearch"
~~~
$ oc logs kibana-59f4bbcc4d-szznb -c kibana
#The following values dynamically added from environment variable overrides:
Using NODE_OPTIONS: '--max_old_space_size=368' Memory setting is in MB
{"type":"log","@timestamp":"2020-08-25T11:26:22Z","tags":["status","plugin:elasticsearch.1","error"],"pid":121,"state":"red","message":"Status changed from yellow to red - Request Timeout after 3000ms","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
~~~

But ElasticSearch status is green, all ES Pods seem to work
~~~
$  oc get Elasticsearch elasticsearch -n openshift-logging -o yaml  | egrep "[ ]status"                                                         
    status: green

$ oc get pods --selector 'component in (elasticsearch,es)' -o wide -n openshift-logging
NAME                                            READY   STATUS    RESTARTS   AGE   IP            NODE                        NOMINATED NODE   READINESS GATES
elasticsearch-cdm-6o5bak27-1-5d7bf795df-7m47r   2/2     Running   0          22h   10.253.2.9    oso2-mjq2s-worker-0-l98tf   <none>           <none>
elasticsearch-cdm-6o5bak27-2-644c565bcb-znbpj   2/2     Running   0          23h   10.255.0.17   oso2-mjq2s-worker-0-549g9   <none>           <none>
elasticsearch-cdm-6o5bak27-3-b968fdf59-fkpcf    2/2     Running   0          23h   10.252.2.10   oso2-mjq2s-worker-0-rvd5v   <none>           <none>

$         oc get deployment --selector 'component in (elasticsearch,es)' -n openshift-logging 
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
elasticsearch-cdm-6o5bak27-1   1/1     1            1           5d4h
elasticsearch-cdm-6o5bak27-2   1/1     1            1           5d4h
elasticsearch-cdm-6o5bak27-3   1/1     1            1           5d4h
~~~

Even Query to Elasticsearch using es_util works
~~~
$   _any_es_pod=$(oc get pods --selector 'component in (elasticsearch,es)' -o name -n openshift-logging | head -1)
hd6@h50l350:/home/hd6/temp_test/osp2_fluentd_overwhelmed
$ oc -n openshift-logging exec $_any_es_pod -c elasticsearch -- es_util --query=_cat/indices/_all?v\&s=store.size\&s=store.size:asc                                                                   
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_1    aZ6k1kXCRzqF9awU0GCPzA   1   1          0            0       522b           261b
green  open   audit-000001 3Njhv0UBSeqm45wdrEkW0w   3   1          0            0      1.5kb           783b
green  open   .security    XZoTJhlnRCuHrzaRFVL-YQ   1   1          5            0     48.9kb         29.9kb
green  open   app-000001   DHOnnh01S5e88_NnTlKJLw   3   1    6038274            0      6.1gb          3.1gb
green  open   infra-000002 UU0zzZt6ScmpqBo48mg3jw   3   1   14179580            0     23.8gb         13.5gb
green  open   infra-000001 lbGCEJGuTqOdgFtVRPhgcQ   3   1   51348667            0       65gb         32.5gb

Comment 1 Jeff Cantrill 2020-08-28 15:47:01 UTC
What happens when you:

1. Try to access Kibana from a "private/incognito" browser?
2. Accessing Kibana from a "private/incognito" browser after deleting the Kibana Pod?

Comment 2 Jeff Cantrill 2020-09-12 01:58:18 UTC
Moving to UpcomingSprint as unlikely to be addressed by EOD

Comment 3 mchebbi@redhat.com 2020-09-14 14:51:15 UTC
Hello,
The customer tried:

1. Accessing Kibana from a "private/incognito" browser and 

2. Accessing Kibana from a "private/incognito" browser after deleting the Kibana Pod

He is still getting this "Kibana server is not ready yet"

I have uploaded a print-screen and logging on the directory updated through the link :https://bit.ly/3b46EPW

Comment 4 Brett Jones 2020-09-14 18:08:49 UTC
Could you please delete the kibana indices and delete the kibana pod and then see if you still have the issue?

Comment 5 Brett Jones 2020-09-21 13:28:00 UTC
@mchebbi do you have an update on this?

Comment 11 errata-xmlrpc 2020-10-27 15:10:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.1 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4198

Comment 12 Red Hat Bugzilla 2023-09-18 00:22:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days