Bug 1878305

Summary: Error 504: Internal Server Error on Kibana
Product: OpenShift Container Platform Reporter: Jatan Malde <jmalde>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: anli, aos-bugs, cblecker, dgautam, dkulkarn, ewolinet, fabian.ahbeck, halim.lee, hfukumot, hgomes, jcantril, jeder, jlee, jmalde, joboyer, mabajodu, mharri, mifiedle, ocasalsa, openshift-bugs-escalate, puraut, rcarrier, ssadhale, ychoukse, ykarajag
Target Milestone: ---Keywords: ServiceDeliveryImpact
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard: logging-exploration osd-45-logging
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Elasticsearch was rejecting requests where the http header exceeded their default size of 8kb Consequence: Requests were being failed in ES and appearing as 50x level errors in Kibana and the proxy. Fix: The max header size allowed by ES was increased to 128kb to accommodate the header information being passed to ES from the Kibana proxy. Result: We do not see 504 errors in Kibana anymore due to max header size being exceeded.
Story Points: ---
Clone Of: 1866490
: 1883357 (view as bug list) Environment:
Last Closed: 2020-10-27 15:12:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1866490, 1883673    
Bug Blocks: 1883357    

Comment 5 ewolinet 2020-09-17 15:28:22 UTC
Jatan,

Can you confirm the following were run as asked from the prior bz?

  from one of the ES nodes:
    es_util --query="_flush/synced" -XPOST
    es_util --query=".kibana_1" -XDELETE
    es_util --query=".security" -XDELETE
    es_util --query="_flush/synced" -XPOST

  then from the cli:
    oc delete pods -l component=elasticsearch -n openshift-logging



The cluster looks to have agreed on an elected master but I still saw errors about users not having permissions.
I did see the following in the elasticsearch proxy... 

  level=info msg="Error processing request in handler authorization: Unable to determine username"


This can be caused by token being expired. Can you try deleting the user's cookie or browsing from an incognito browser?

https://github.com/openshift/elasticsearch-proxy/pull/47 should end up providing a better UX in the future around this.

Comment 16 Jatan Malde 2020-09-28 20:10:46 UTC
Hello Eric, 

Thanks for the help on this bugzilla, 

Just to keep the bugzilla updated with the workaround for now until the bugzilla is been included in an errata, 

We saw that when we hit kibana status page we could see http 400 message with the bytes reported to be not allowed where we saw default of 8kb. 

We changed the elasticsearch CR to unmanaged initially
  
   # oc edit elasticsearch elasticsearch -n openshift-logging
   
Setting it to unmanaged, we moved to the configmaps in the openshift-logging namespace, 
 
  # oc edit cm elasticsearch -n openshift-logging   // added the following line, 
  
   http.max_header_size: 16kb
   
 Once that is done we need to probably restart the elasticsearch so that the new pods run with updated config, 
 
  # oc delete pod -l component=elasticsearch 
  
 Wait for the new pods and then check if we can clear the cookies and then access the kibana status page.
 
It should show you the index management page and then once the index pattern is setup you should be able to see the kibanaUI page.

Thanks.

Comment 33 errata-xmlrpc 2020-10-27 15:12:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.1 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4198