Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1530157

Summary: Kibana timeout if viewing many (100M) records
Product: OpenShift Container Platform Reporter: Shirly Radco <sradco>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED CURRENTRELEASE QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: aos-bugs, jcantril, pportant, rmeggins, sradco, stwalter, vlaad
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1538171 1589905 (view as bug list) Environment:
Last Closed: 2018-08-28 17:42:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1538171, 1589905    
Attachments:
Description Flags
logging-20180117_111929.tar.gz
none
The kibana debug logs none

Description Shirly Radco 2018-01-02 07:59:23 UTC
Description of problem:

When I try to view metrics and logs dashboards for a period longer than like 24 hours, we get the error:
"Visualize: Gateway Timeout More Info OK"

Metrics index includes 175,647,144 docs per day.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Send about the same amount of docs to index for a few days
2. Try to run a report for more than 24 hours.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Cantrill 2018-01-15 20:47:51 UTC
Can you advise to which 'log' and 'metrics' dashboards you refer?

Comment 2 Jeff Cantrill 2018-01-15 20:51:28 UTC
Can you provide additional information regarding your environment so we can understand if your deployment is properly sized for your requirements.  Additionally, it would help if you could provide information regarding the openshiftcluster (e.g. version) on which you are running

Comment 3 Shirly Radco 2018-01-16 11:29:13 UTC
Rich, Can you please help me to provide the required information?

Comment 4 Rich Megginson 2018-01-16 12:58:51 UTC
(In reply to Jeff Cantrill from comment #2)
> Can you provide additional information regarding your environment so we can
> understand if your deployment is properly sized for your requirements. 
> Additionally, it would help if you could provide information regarding the
> openshiftcluster (e.g. version) on which you are running

@shirly - run https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh
tar up the resulting files/dirs and attach to this bz

The real problem is that there is no way to tune the kibana timeout . . .

Comment 6 Shirly Radco 2018-01-17 09:32:46 UTC
Created attachment 1382317 [details]
logging-20180117_111929.tar.gz

Comment 7 Shirly Radco 2018-01-17 09:34:50 UTC
I updated the report Rich recommended.
Please give this a high priority since users will not be able to view dashboards based on metrics for a period greater then 12 hours which is a major issue.

Comment 8 Jeff Cantrill 2018-01-18 13:22:55 UTC
@Rich the posted PR will allow you to modify the request timeout.  Pending review of the attached report this may or may not resolve the issue.  I suspect there is additional performance tuning required.

Comment 9 openshift-github-bot 2018-01-18 17:21:30 UTC
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/969e1302a071ed3549679240892413a37f349297
bug 1530157. Configure Kibana timeout via env var

https://github.com/openshift/origin-aggregated-logging/commit/6f07813f44d0d1a1e86fea9ab2b0f9cfda280fbb
Merge pull request #905 from jcantrill/bz1530157_config_kibana_timeout

Automatic merge from submit-queue.

bug 1530157. Configure Kibana timeout via env var

This PR allows you to modify the Kibana config via env vars

Comment 10 Anping Li 2018-01-22 10:42:00 UTC
Created attachment 1384344 [details]
The kibana debug logs

The kibana reported "Payload timeout must be shorter than socket timeout' when I set ELASTICSEARCH_REQUESTTIMEOUT=1 in DC.
What is the socket timeout number? What the number to use for ELASTICSEARCH_REQUESTTIMEOUT?


image: logging-kibana/images/v3.9.0-0.22.0.0



    spec:
      containers:
      - env:
        - name: ES_HOST
          value: logging-es
        - name: ES_PORT
          value: "9200"
        - name: DEBUG
          value: "true"
        - name: KIBANA_MEMORY_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: kibana
              divisor: "0"
              resource: limits.memory
        - name: ELASTICSEARCH_REQUESTTIMEOUT
          value: "1"

Comment 11 Jeff Cantrill 2018-01-22 14:02:56 UTC
It looks like you attempted to test the same way I did.  The error is explained here [1] which likes like the value must be greater then 10s

[1] https://stackoverflow.com/questions/48117400/adding-requesttimeout-causes-kibana-to-fail-at-startup

Comment 12 Anping Li 2018-01-23 04:36:57 UTC
Verified the  ELASTICSEARCH_REQUESTTIMEOUT can be send by Environment in logging-kibana/images/v3.9.0-0.22.0.0.

Comment 13 Jeff Cantrill 2018-02-13 18:33:29 UTC
*** Bug 1509025 has been marked as a duplicate of this bug. ***