Bug 1410694

Summary: Kibana hangs on searching if nothing found, and never completes for non-admin users
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED WONTFIX QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: medium    
Version: 3.4.0CC: aos-bugs, bandrade, erich, jcantril, knakayam, pportant, rbost, rmeggins, rromerom, stwalter, whearn, wsun, xiazhao, zhiwliu
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-10 17:33:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kibana hang on empty searchings
none
kibana logs when UI is hang none

Description Xia Zhao 2017-01-06 05:53:14 UTC
Created attachment 1237867 [details]
kibana hang on empty searchings

Description of problem:
Adjust the kibana time range to conditions with empty results, UI hangs on "searching", and never completes.
Any non-empty search conditions can bring it back.

Version-Release number of selected component (if applicable):
ops registry:
openshift3/logging-kibana    8a3df528c998
openshift3/logging-elasticsearch    583e04127ed6
openshift3/logging-auth-proxy    d9236074fecb
openshift3/logging-fluentd    43d549beb4c8
openshift3/logging-curator    5aadf9eb6168
openshift3/logging-deployer    7386facde449

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging with use_journal = true, and then login with non-admin user
2. Login kibana, adjust time range to some conditions with empty log entries
3. Refine to a non-empty search conditions

Actual results:
2. UI hangs on "searching", and never completes. See attached screenshot
3. UI come back with log entries displayed

Expected results:
2. UI should state "No results found" once the search ended

Additional info:
The kibana Screenshot was attached

Comment 1 Xia Zhao 2017-01-06 07:46:50 UTC
Issue did not repro to cluster-admin user.

Comment 2 ewolinet 2017-01-06 15:58:03 UTC
Can you please provide the logs from the ES pod when this happens?

Comment 3 Xia Zhao 2017-01-09 09:05:26 UTC
@ewolinet

Please find the attached logs when kibana UI hangs on namespace logging with a search of "17 minutes" for long.

The logging stacks are deployed with parameter "use_journal = true", and then login kibana with non-admin user.

Comment 4 Xia Zhao 2017-01-09 09:06:23 UTC
Created attachment 1238587 [details]
kibana logs when UI is hang

Comment 7 Peter Portante 2017-03-15 00:09:58 UTC
Not sure where this will lead, but trying to make sense of the information so far.

First, if you notice in the image provided, it reads across the top: "<Warning symbol> Discover: [security_exception] no permissions for indices:data/read/msearch".  Can you find out what clicking on "More Info" gives you?

Second, looking at the logs, and at the code for that SearchGuard warning [1], we see:

 * at time 7:23:06,075 and ,076
   We see that some entity has requested a search for an index pattern:
     project.logging.32b8e1e5-d631-11e6-b989-42010af00025.*
   which does not get created until 7:24:08,324 as
     project.logging.32b8e1e5-d631-11e6-b989-42010af00025.2017.01.09
 * at time 7:23:08,306 and 308
   We see a similar pattern of warnings, but this time for an explicit index
     project.logging.32b8e1e5-d631-11e6-b989-42010af00025.2017.01.09
   which, as above, is not created until ...
 * at time 7:24:08,324
   We finally see the index created as:
     project.logging.32b8e1e5-d631-11e6-b989-42010af00025.2017.01.09
 * then notice that normal index operation logs are displayed for cluster wide
   ES metadata changes at times:
   7:24:08,388
   8:31:24,519
   8:34:21,915
   8:50:33,812

This looks normal, really, with the exception that SearchGuard is a bit pendantic about issuing a warning for index names that don't exist.

What I think we need to do is understand what the index pattern is that is being used in kibana, and how the Elasticsearch kibana plugin munges it into the actual request.  The attached image makes me wonder why we have an explicit project.logging.fb722... index pattern name created, and what that is.

Third, can you fetch the Elasticsearch log file in the ES pod itself, probably located at /var/log/elasticsearch/logging-es.log, or perhaps in the /elasticsearch/ tree?  The elasticsearch.yml file should have a reference to where it is located.

Comment 8 Xia Zhao 2017-03-15 05:58:23 UTC
(In reply to Peter Portante from comment #7)

> First, if you notice in the image provided, it reads across the top:
> "<Warning symbol> Discover: [security_exception] no permissions for
> indices:data/read/msearch".  Can you find out what clicking on "More Info"
> gives you?

This issue was addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1408412

> Second, looking at the logs, ...

> What I think we need to do is understand what the index pattern is that is
> being used in kibana, and how the Elasticsearch kibana plugin munges it into
> the actual request.  The attached image makes me wonder why we have an
> explicit project.logging.fb722... index pattern name created, and what that
> is.

The es log and kibana screenshot uploaded here came from different test envs.

Comment 9 Xia Zhao 2017-03-15 07:47:38 UTC
(In reply to Peter Portante from comment #7)

> Third, can you fetch the Elasticsearch log file in the ES pod itself,
> probably located at /var/log/elasticsearch/logging-es.log, or perhaps in the
> /elasticsearch/ tree?  The elasticsearch.yml file should have a reference to
> where it is located.

The elasticsearch.yml
 file linked me to /elasticsearch/${CLUSTER_NAME}/logs , but didn't find it there:

sh-4.2$ pwd
/etc/elasticsearch
sh-4.2$ ls -al
total 16
drwxrwxrwx.  4 root root         79 Mar 15 06:47 .
drwxr-xr-x. 60 root root       4096 Mar 15 06:47 ..
-rw-rw-rw-.  1 root root       3192 Jan 18 03:11 elasticsearch.yml
-rw-rw-rw-.  1 root root        511 Jan 18 03:11 logging.yml
drwxrwxrwx.  2 root root          6 Jan 18 03:11 scripts
drwxrwsrwt.  3 root 1000050000  240 Mar 15 06:46 secret

Here is the content of elasticsearch.yml:

sh-4.2$ cat /usr/share/java/elasticsearch/config/elasticsearch.yml
cluster:
  name: ${CLUSTER_NAME}

script:
  inline: on
  indexed: on

index:
  number_of_shards: 1
  number_of_replicas: 0
  auto_expand_replicas: 0-2
  unassigned.node_left.delayed_timeout: 2m
  translog:
    flush_threshold_size: 256mb
    flush_threshold_period: 5m

node:
  master: true
  data: true

network:
  host: 0.0.0.0

cloud:
  kubernetes:
    service: ${SERVICE_DNS}
    namespace: ${NAMESPACE}

discovery:
  type: kubernetes
  zen.ping.multicast.enabled: false

gateway:
  expected_master_nodes: ${NODE_QUORUM}
  recover_after_nodes: ${RECOVER_AFTER_NODES}
  expected_nodes: ${RECOVER_EXPECTED_NODES}
  recover_after_time: ${RECOVER_AFTER_TIME}

io.fabric8.elasticsearch.authentication.users: ["system.logging.kibana", "system.logging.fluentd", "system.logging.curator", "system.admin"]

openshift.config:
  use_common_data_model: true
  project_index_prefix: "project"
  time_field_name: "@timestamp"

openshift.searchguard:
  keystore.path: /etc/elasticsearch/secret/admin.jks
  truststore.path: /etc/elasticsearch/secret/searchguard.truststore

openshift.operations.allow_cluster_reader: false

path:
  data: /elasticsearch/persistent/${CLUSTER_NAME}/data
  logs: /elasticsearch/${CLUSTER_NAME}/logs
  work: /elasticsearch/${CLUSTER_NAME}/work
  scripts: /elasticsearch/${CLUSTER_NAME}/scripts

searchguard:
  authcz.admin_dn:
  - CN=system.admin,OU=OpenShift,O=Logging
  config_index_name: ".searchguard.${HOSTNAME}"
  ssl:
    transport:
      enabled: true
      enforce_hostname_verification: false
      keystore_type: JKS
      keystore_filepath: /etc/elasticsearch/secret/searchguard.key
      keystore_password: kspass
      truststore_type: JKS
      truststore_filepath: /etc/elasticsearch/secret/searchguard.truststore
      truststore_password: tspass
    http:
      enabled: true
      keystore_type: JKS
      keystore_filepath: /etc/elasticsearch/secret/key
      keystore_password: kspass
      clientauth_mode: OPTIONAL
      truststore_type: JKS
      truststore_filepath: /etc/elasticsearch/secret/truststore
      truststore_password: tspass

Comment 10 Peter Portante 2017-03-15 11:24:58 UTC
(In reply to Xia Zhao from comment #8)
> (In reply to Peter Portante from comment #7)
> The es log and kibana screenshot uploaded here came from different test envs.

So we are now down to one problem in one BZ.  What is kibana doing now that the other BZ solved it?

(In reply to Xia Zhao from comment #9)
> (In reply to Peter Portante from comment #7)
> 
> sh-4.2$ cat /usr/share/java/elasticsearch/config/elasticsearch.yml
> cluster:
>   name: ${CLUSTER_NAME}
> gateway:
>   expected_master_nodes: ${NODE_QUORUM}
>   recover_after_nodes: ${RECOVER_AFTER_NODES}
>   expected_nodes: ${RECOVER_EXPECTED_NODES}
>   recover_after_time: ${RECOVER_AFTER_TIME}

Can you find out what the environmental parameters for these are?

> 
> path:
>   data: /elasticsearch/persistent/${CLUSTER_NAME}/data
>   logs: /elasticsearch/${CLUSTER_NAME}/logs
>   work: /elasticsearch/${CLUSTER_NAME}/work
>   scripts: /elasticsearch/${CLUSTER_NAME}/scripts

The CLUSTER_NAME variable likely logging-es or logging-es-ops, so the logs would be at /elasticsearch/logging-es[-ops]/logs/logging-es[-ops].log.

Comment 11 Xia Zhao 2017-03-21 08:46:06 UTC
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1434300

Comment 13 Xia Zhao 2017-03-28 07:55:05 UTC
(In reply to Peter Portante from comment #10)
> (In reply to Xia Zhao from comment #8)
> > (In reply to Peter Portante from comment #7)
> > The es log and kibana screenshot uploaded here came from different test envs.
> 
> So we are now down to one problem in one BZ.  What is kibana doing now that
> the other BZ solved it?
Issue reproduced with the latest 3.5.0 images:
openshift3/logging-elasticsearch   3.5.0               5ff198b5c68d
openshift3/logging-kibana          3.5.0               a6159c640977

> (In reply to Xia Zhao from comment #9)
> > (In reply to Peter Portante from comment #7)
> > 
> > sh-4.2$ cat /usr/share/java/elasticsearch/config/elasticsearch.yml
> > cluster:
> >   name: ${CLUSTER_NAME}
> > gateway:
> >   expected_master_nodes: ${NODE_QUORUM}
> >   recover_after_nodes: ${RECOVER_AFTER_NODES}
> >   expected_nodes: ${RECOVER_EXPECTED_NODES}
> >   recover_after_time: ${RECOVER_AFTER_TIME}
> 
> Can you find out what the environmental parameters for these are?
NODE_QUORUM=1
RECOVER_AFTER_NODES=0
RECOVER_EXPECTED_NODES=1
RECOVER_AFTER_TIME=5m

> > 
> > path:
> >   data: /elasticsearch/persistent/${CLUSTER_NAME}/data
> >   logs: /elasticsearch/${CLUSTER_NAME}/logs
> >   work: /elasticsearch/${CLUSTER_NAME}/work
> >   scripts: /elasticsearch/${CLUSTER_NAME}/scripts
> 
> The CLUSTER_NAME variable likely logging-es or logging-es-ops, so the logs
> would be at /elasticsearch/logging-es[-ops]/logs/logging-es[-ops].log.

From environmental parameters , it is CLUSTER_NAME=logging-es in my env. Attaching the /elasticsearch/logging-es/logs/logging-es.log  on my 3.5.0 env.

Comment 14 Peter Portante 2017-04-18 17:01:49 UTC
I think this should be raised as a priority, as this is really confusing for customers.

Comment 16 Jeff Cantrill 2017-05-11 15:36:29 UTC
The only way to avoid as far as I know is for fluentd to collect logs from the namespace.  I would suggest investigating if fluentd is collecting logs.

Comment 20 Jeff Cantrill 2017-06-28 13:58:49 UTC
This is not a blocker issues so lowering the priority.  This should be mitigated by the fix provided to bug 1456413 which will inform the user that there is not data collected for the index in question.

Comment 26 Steven Walter 2017-08-24 17:44:06 UTC
I *think* we're good. If any case owners disagree with this decision please give data to counterargument, otherwise I'm closing for now.

RELIEVED BY: https://bugzilla.redhat.com/show_bug.cgi?id=1456413

Comment 27 Ruben Romero Montes 2017-10-09 07:34:16 UTC
(In reply to Jeff Cantrill from comment #20)
> This is not a blocker issues so lowering the priority.  This should be
> mitigated by the fix provided to bug 1456413 which will inform the user that
> there is not data collected for the index in question.

The mentioned bugzilla only shows the appropriate "No logs found" when there is no index at all.

When there is no data in the selected time frame Kibana keeps acting the same and the "Searching" icon keeps loading indefinitely showing the following error in the UI (reproduced in latest 3.6 images):

 Discover: [security_exception] no permissions for indices:data/read/msearch

Error: [security_exception] no permissions for indices:data/read/msearch
ErrorAbstract@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:95501:20
StatusCodeError@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:95663:6
respond@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:96868:16
checkRespForFailure@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:96831:8
[24]</AngularConnector.prototype.request/<@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:95449:8
processQueue@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:42452:29
scheduleProcessQueue/<@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:42468:28
$eval@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:43696:17
$digest@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:43507:16
$apply@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:43804:14
done@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:38253:37
completeRequest@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:38451:8
requestLoaded@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:38392:10

Comment 28 Jeff Cantrill 2017-10-10 17:33:38 UTC
This should be fixed with plugin version 2.4.4.13 [1] which gives users read rights to make that call.  You can confirm the the roles and mappings by:

1. oc exec -c elasticsearch $pod -- es_acl get --doc=rolesmapping
1. oc exec -c elasticsearch $pod -- es_acl get --doc=roles
1. oc exec -c elasticsearch $pod -- es_acl get --doc=actiongroups

where:

a role for the user should exist that includes an actionmapping to USER_CLUSTER_OPERATIONS

[1] https://github.com/openshift/origin-aggregated-logging/pull/571

If you still see an issue please create one that is targeted for the 3.6 release.