Bug 1705026

Summary: Gateway timeout when accessing Kibana dashboard
Product: OpenShift Container Platform Reporter: Robert Sandu <rsandu>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED WONTFIX QA Contact: Anping Li <anli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, jcantril, lmartinh, rmeggins
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-02 20:15:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Sandu 2019-05-01 07:20:38 UTC
Description of problem: Kibana login returns fatal error upon login:

```
Fatal Error
Courier Fetch Error: unhandled courier request error: [security_exception] no permissions for indices:data/read/mget
Version: 4.6.4
Build: 10229
Error: unhandled courier request error: [security_exception] no permissions for indices:data/read/mget"
```

Version-Release number of selected component (if applicable): v3.10.34

Actual results: Kibana login return "Error: unhandled courier request error: [security_exception] no permissions for indices:data/read/mget"


Expected results: Kibana login to work properly.


Additional info:

- Similar issue in [1], solved through an errata in v3.10.15. However, in this case, the customer has v3.10.34 deployed.
- Indices seem healthy.
- Doesn't seems permissions or retention time frame related: same error is reproduced by cluster-admin users upon login.

Comment 3 Jeff Cantrill 2019-05-02 15:35:15 UTC
Reviewing the logs I see several instances of ES nodes dropping our of the ES cluster.  

* Can you check the connectivity to the ES nodes to verify the latency [2]?
* Maybe also check the connectivity from Kibana [3]

The gateway timeout can be mitigated by adjusting the request timeout [1]

[1] https://github.com/openshift/origin-aggregated-logging/tree/release-3.10/kibana#configuration-modifications
[2] https://github.com/jcantrill/cluster-logging-tools/blob/master/scripts/check-es-cluster-connectivity
[3] https://github.com/jcantrill/cluster-logging-tools/blob/master/scripts/check-kibana-to-es-connectivity

Comment 4 Robert Sandu 2019-05-06 09:57:17 UTC
Hi.

Increasing ELASTICSEARCH_REQUESTTIMEOUT seems to make no difference. Customer is still seeing gateway timeout" errors when accessing Kibana dashboard.

Attached you can find the outputs of the ES latency test and Kibana connectivity.

Comment 13 Jeff Cantrill 2019-07-02 13:25:34 UTC
I believe this will be resolved by https://bugzilla.redhat.com/show_bug.cgi?id=1705589 and subsequent backports to 3.11.  The upstream issues are merged and I am doing the work now to get them built for a production release.

Comment 14 Robert Sandu 2019-07-02 13:28:46 UTC
(In reply to Jeff Cantrill from comment #13)
> I believe this will be resolved by
> https://bugzilla.redhat.com/show_bug.cgi?id=1705589 and subsequent backports
> to 3.11.  The upstream issues are merged and I am doing the work now to get
> them built for a production release.

Hi Jeff.

Which 3.11 z-stream version has been (or will be) backported to?

Thank you.

Comment 15 Jeff Cantrill 2019-07-02 20:15:26 UTC
Closing as duplicate to https://bugzilla.redhat.com/show_bug.cgi?id=1705589 .  Will backport to 3.11. Please reopen if not resolved

*** This bug has been marked as a duplicate of bug 1705589 ***

Comment 16 Jeff Cantrill 2019-07-02 20:16:10 UTC
Changing status to WONTFIX as we will resolve foe 3.11 but not 3.10