Bug 1410694
Summary: | Kibana hangs on searching if nothing found, and never completes for non-admin users | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||
Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Anping Li <anli> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.4.0 | CC: | aos-bugs, bandrade, erich, jcantril, knakayam, pportant, rbost, rmeggins, rromerom, stwalter, whearn, wsun, xiazhao, zhiwliu | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-10-10 17:33:38 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Issue did not repro to cluster-admin user. Can you please provide the logs from the ES pod when this happens? @ewolinet Please find the attached logs when kibana UI hangs on namespace logging with a search of "17 minutes" for long. The logging stacks are deployed with parameter "use_journal = true", and then login kibana with non-admin user. Created attachment 1238587 [details]
kibana logs when UI is hang
Not sure where this will lead, but trying to make sense of the information so far. First, if you notice in the image provided, it reads across the top: "<Warning symbol> Discover: [security_exception] no permissions for indices:data/read/msearch". Can you find out what clicking on "More Info" gives you? Second, looking at the logs, and at the code for that SearchGuard warning [1], we see: * at time 7:23:06,075 and ,076 We see that some entity has requested a search for an index pattern: project.logging.32b8e1e5-d631-11e6-b989-42010af00025.* which does not get created until 7:24:08,324 as project.logging.32b8e1e5-d631-11e6-b989-42010af00025.2017.01.09 * at time 7:23:08,306 and 308 We see a similar pattern of warnings, but this time for an explicit index project.logging.32b8e1e5-d631-11e6-b989-42010af00025.2017.01.09 which, as above, is not created until ... * at time 7:24:08,324 We finally see the index created as: project.logging.32b8e1e5-d631-11e6-b989-42010af00025.2017.01.09 * then notice that normal index operation logs are displayed for cluster wide ES metadata changes at times: 7:24:08,388 8:31:24,519 8:34:21,915 8:50:33,812 This looks normal, really, with the exception that SearchGuard is a bit pendantic about issuing a warning for index names that don't exist. What I think we need to do is understand what the index pattern is that is being used in kibana, and how the Elasticsearch kibana plugin munges it into the actual request. The attached image makes me wonder why we have an explicit project.logging.fb722... index pattern name created, and what that is. Third, can you fetch the Elasticsearch log file in the ES pod itself, probably located at /var/log/elasticsearch/logging-es.log, or perhaps in the /elasticsearch/ tree? The elasticsearch.yml file should have a reference to where it is located. (In reply to Peter Portante from comment #7) > First, if you notice in the image provided, it reads across the top: > "<Warning symbol> Discover: [security_exception] no permissions for > indices:data/read/msearch". Can you find out what clicking on "More Info" > gives you? This issue was addressed in https://bugzilla.redhat.com/show_bug.cgi?id=1408412 > Second, looking at the logs, ... > What I think we need to do is understand what the index pattern is that is > being used in kibana, and how the Elasticsearch kibana plugin munges it into > the actual request. The attached image makes me wonder why we have an > explicit project.logging.fb722... index pattern name created, and what that > is. The es log and kibana screenshot uploaded here came from different test envs. (In reply to Peter Portante from comment #7) > Third, can you fetch the Elasticsearch log file in the ES pod itself, > probably located at /var/log/elasticsearch/logging-es.log, or perhaps in the > /elasticsearch/ tree? The elasticsearch.yml file should have a reference to > where it is located. The elasticsearch.yml file linked me to /elasticsearch/${CLUSTER_NAME}/logs , but didn't find it there: sh-4.2$ pwd /etc/elasticsearch sh-4.2$ ls -al total 16 drwxrwxrwx. 4 root root 79 Mar 15 06:47 . drwxr-xr-x. 60 root root 4096 Mar 15 06:47 .. -rw-rw-rw-. 1 root root 3192 Jan 18 03:11 elasticsearch.yml -rw-rw-rw-. 1 root root 511 Jan 18 03:11 logging.yml drwxrwxrwx. 2 root root 6 Jan 18 03:11 scripts drwxrwsrwt. 3 root 1000050000 240 Mar 15 06:46 secret Here is the content of elasticsearch.yml: sh-4.2$ cat /usr/share/java/elasticsearch/config/elasticsearch.yml cluster: name: ${CLUSTER_NAME} script: inline: on indexed: on index: number_of_shards: 1 number_of_replicas: 0 auto_expand_replicas: 0-2 unassigned.node_left.delayed_timeout: 2m translog: flush_threshold_size: 256mb flush_threshold_period: 5m node: master: true data: true network: host: 0.0.0.0 cloud: kubernetes: service: ${SERVICE_DNS} namespace: ${NAMESPACE} discovery: type: kubernetes zen.ping.multicast.enabled: false gateway: expected_master_nodes: ${NODE_QUORUM} recover_after_nodes: ${RECOVER_AFTER_NODES} expected_nodes: ${RECOVER_EXPECTED_NODES} recover_after_time: ${RECOVER_AFTER_TIME} io.fabric8.elasticsearch.authentication.users: ["system.logging.kibana", "system.logging.fluentd", "system.logging.curator", "system.admin"] openshift.config: use_common_data_model: true project_index_prefix: "project" time_field_name: "@timestamp" openshift.searchguard: keystore.path: /etc/elasticsearch/secret/admin.jks truststore.path: /etc/elasticsearch/secret/searchguard.truststore openshift.operations.allow_cluster_reader: false path: data: /elasticsearch/persistent/${CLUSTER_NAME}/data logs: /elasticsearch/${CLUSTER_NAME}/logs work: /elasticsearch/${CLUSTER_NAME}/work scripts: /elasticsearch/${CLUSTER_NAME}/scripts searchguard: authcz.admin_dn: - CN=system.admin,OU=OpenShift,O=Logging config_index_name: ".searchguard.${HOSTNAME}" ssl: transport: enabled: true enforce_hostname_verification: false keystore_type: JKS keystore_filepath: /etc/elasticsearch/secret/searchguard.key keystore_password: kspass truststore_type: JKS truststore_filepath: /etc/elasticsearch/secret/searchguard.truststore truststore_password: tspass http: enabled: true keystore_type: JKS keystore_filepath: /etc/elasticsearch/secret/key keystore_password: kspass clientauth_mode: OPTIONAL truststore_type: JKS truststore_filepath: /etc/elasticsearch/secret/truststore truststore_password: tspass (In reply to Xia Zhao from comment #8) > (In reply to Peter Portante from comment #7) > The es log and kibana screenshot uploaded here came from different test envs. So we are now down to one problem in one BZ. What is kibana doing now that the other BZ solved it? (In reply to Xia Zhao from comment #9) > (In reply to Peter Portante from comment #7) > > sh-4.2$ cat /usr/share/java/elasticsearch/config/elasticsearch.yml > cluster: > name: ${CLUSTER_NAME} > gateway: > expected_master_nodes: ${NODE_QUORUM} > recover_after_nodes: ${RECOVER_AFTER_NODES} > expected_nodes: ${RECOVER_EXPECTED_NODES} > recover_after_time: ${RECOVER_AFTER_TIME} Can you find out what the environmental parameters for these are? > > path: > data: /elasticsearch/persistent/${CLUSTER_NAME}/data > logs: /elasticsearch/${CLUSTER_NAME}/logs > work: /elasticsearch/${CLUSTER_NAME}/work > scripts: /elasticsearch/${CLUSTER_NAME}/scripts The CLUSTER_NAME variable likely logging-es or logging-es-ops, so the logs would be at /elasticsearch/logging-es[-ops]/logs/logging-es[-ops].log. (In reply to Peter Portante from comment #10) > (In reply to Xia Zhao from comment #8) > > (In reply to Peter Portante from comment #7) > > The es log and kibana screenshot uploaded here came from different test envs. > > So we are now down to one problem in one BZ. What is kibana doing now that > the other BZ solved it? Issue reproduced with the latest 3.5.0 images: openshift3/logging-elasticsearch 3.5.0 5ff198b5c68d openshift3/logging-kibana 3.5.0 a6159c640977 > (In reply to Xia Zhao from comment #9) > > (In reply to Peter Portante from comment #7) > > > > sh-4.2$ cat /usr/share/java/elasticsearch/config/elasticsearch.yml > > cluster: > > name: ${CLUSTER_NAME} > > gateway: > > expected_master_nodes: ${NODE_QUORUM} > > recover_after_nodes: ${RECOVER_AFTER_NODES} > > expected_nodes: ${RECOVER_EXPECTED_NODES} > > recover_after_time: ${RECOVER_AFTER_TIME} > > Can you find out what the environmental parameters for these are? NODE_QUORUM=1 RECOVER_AFTER_NODES=0 RECOVER_EXPECTED_NODES=1 RECOVER_AFTER_TIME=5m > > > > path: > > data: /elasticsearch/persistent/${CLUSTER_NAME}/data > > logs: /elasticsearch/${CLUSTER_NAME}/logs > > work: /elasticsearch/${CLUSTER_NAME}/work > > scripts: /elasticsearch/${CLUSTER_NAME}/scripts > > The CLUSTER_NAME variable likely logging-es or logging-es-ops, so the logs > would be at /elasticsearch/logging-es[-ops]/logs/logging-es[-ops].log. From environmental parameters , it is CLUSTER_NAME=logging-es in my env. Attaching the /elasticsearch/logging-es/logs/logging-es.log on my 3.5.0 env. I think this should be raised as a priority, as this is really confusing for customers. The only way to avoid as far as I know is for fluentd to collect logs from the namespace. I would suggest investigating if fluentd is collecting logs. This is not a blocker issues so lowering the priority. This should be mitigated by the fix provided to bug 1456413 which will inform the user that there is not data collected for the index in question. I *think* we're good. If any case owners disagree with this decision please give data to counterargument, otherwise I'm closing for now. RELIEVED BY: https://bugzilla.redhat.com/show_bug.cgi?id=1456413 (In reply to Jeff Cantrill from comment #20) > This is not a blocker issues so lowering the priority. This should be > mitigated by the fix provided to bug 1456413 which will inform the user that > there is not data collected for the index in question. The mentioned bugzilla only shows the appropriate "No logs found" when there is no index at all. When there is no data in the selected time frame Kibana keeps acting the same and the "Searching" icon keeps loading indefinitely showing the following error in the UI (reproduced in latest 3.6 images): Discover: [security_exception] no permissions for indices:data/read/msearch Error: [security_exception] no permissions for indices:data/read/msearch ErrorAbstract@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:95501:20 StatusCodeError@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:95663:6 respond@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:96868:16 checkRespForFailure@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:96831:8 [24]</AngularConnector.prototype.request/<@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:95449:8 processQueue@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:42452:29 scheduleProcessQueue/<@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:42468:28 $eval@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:43696:17 $digest@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:43507:16 $apply@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:43804:14 done@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:38253:37 completeRequest@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:38451:8 requestLoaded@https://kibana.apps36.example.com/bundles/commons.bundle.js?v=10229:38392:10 This should be fixed with plugin version 2.4.4.13 [1] which gives users read rights to make that call. You can confirm the the roles and mappings by: 1. oc exec -c elasticsearch $pod -- es_acl get --doc=rolesmapping 1. oc exec -c elasticsearch $pod -- es_acl get --doc=roles 1. oc exec -c elasticsearch $pod -- es_acl get --doc=actiongroups where: a role for the user should exist that includes an actionmapping to USER_CLUSTER_OPERATIONS [1] https://github.com/openshift/origin-aggregated-logging/pull/571 If you still see an issue please create one that is targeted for the 3.6 release. |
Created attachment 1237867 [details] kibana hang on empty searchings Description of problem: Adjust the kibana time range to conditions with empty results, UI hangs on "searching", and never completes. Any non-empty search conditions can bring it back. Version-Release number of selected component (if applicable): ops registry: openshift3/logging-kibana 8a3df528c998 openshift3/logging-elasticsearch 583e04127ed6 openshift3/logging-auth-proxy d9236074fecb openshift3/logging-fluentd 43d549beb4c8 openshift3/logging-curator 5aadf9eb6168 openshift3/logging-deployer 7386facde449 How reproducible: Always Steps to Reproduce: 1. Deploy logging with use_journal = true, and then login with non-admin user 2. Login kibana, adjust time range to some conditions with empty log entries 3. Refine to a non-empty search conditions Actual results: 2. UI hangs on "searching", and never completes. See attached screenshot 3. UI come back with log entries displayed Expected results: 2. UI should state "No results found" once the search ended Additional info: The kibana Screenshot was attached