Bug 1845293
| Summary: | Elasticsearch returning too_long_frame_exception when calling endpoints through external route after logging workload | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eric Matysek <ematysek> | ||||
| Component: | Logging | Assignee: | ewolinet | ||||
| Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | ||||
| Severity: | low | Docs Contact: | Rolfe Dlugy-Hegwer <rdlugyhe> | ||||
| Priority: | medium | ||||||
| Version: | 4.5 | CC: | anli, aos-bugs, ewolinet, jcantril, mifiedle, periklis, rdlugyhe, vimalkum | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.7.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | logging-exploration | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
* Previously, Elasticsearch rejected HTTP requests whose headers exceeded the default max header size, 8kb. The current release fixes this issue by increasing the max header size to 128kb. We no longer see Elasticsearch rejecting HTTP requests for exceeding the max header size.
(link:https://bugzilla.redhat.com/show_bug.cgi?id=1845293[*BZ#1845293*])
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-02-24 11:21:18 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Eric Matysek
2020-06-08 20:31:42 UTC
We can see hitting the service IP from inside the cluster has the same result: $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cluster-logging-operator ClusterIP 172.30.110.230 <none> 8383/TCP 3h35m elasticsearch ClusterIP 172.30.72.50 <none> 9200/TCP 3h35m elasticsearch-cluster ClusterIP 172.30.67.59 <none> 9300/TCP 3h35m elasticsearch-metrics ClusterIP 172.30.150.72 <none> 60001/TCP 3h35m fluentd ClusterIP 172.30.77.147 <none> 24231/TCP 3h35m kibana ClusterIP 172.30.108.203 <none> 443/TCP 3h35m openshift-olm-test-clo ClusterIP 172.30.196.98 <none> 50051/TCP 3h35m $ oc exec elasticsearch-cdm-i0d4932s-1-59c8b4c977-ctlmn -c elasticsearch -- curl -k https://172.30.72.50:9200/_cat/indices?v -H "Authorization: Bearer SOME_BEARER" -s {"error":{"root_cause":[{"type":"too_long_frame_exception","reason":"HTTP header is larger than 8192 bytes."}],"type":"too_long_frame_exception","reason":"HTTP header is larger than 8192 bytes."},"status":400} (In reply to Eric Matysek from comment #1) > We can see hitting the service IP from inside the cluster has the same > result: > $ oc get svc > NAME TYPE CLUSTER-IP EXTERNAL-IP > PORT(S) AGE > cluster-logging-operator ClusterIP 172.30.110.230 <none> > 8383/TCP 3h35m > elasticsearch ClusterIP 172.30.72.50 <none> > 9200/TCP 3h35m > elasticsearch-cluster ClusterIP 172.30.67.59 <none> > 9300/TCP 3h35m > elasticsearch-metrics ClusterIP 172.30.150.72 <none> > 60001/TCP 3h35m > fluentd ClusterIP 172.30.77.147 <none> > 24231/TCP 3h35m > kibana ClusterIP 172.30.108.203 <none> > 443/TCP 3h35m > openshift-olm-test-clo ClusterIP 172.30.196.98 <none> > 50051/TCP 3h35m > > > $ oc exec elasticsearch-cdm-i0d4932s-1-59c8b4c977-ctlmn -c elasticsearch -- > curl -k https://172.30.72.50:9200/_cat/indices?v -H "Authorization: Bearer > SOME_BEARER" -s > {"error":{"root_cause":[{"type":"too_long_frame_exception","reason":"HTTP > header is larger than 8192 > bytes."}],"type":"too_long_frame_exception","reason":"HTTP header is larger > than 8192 bytes."},"status":400} Can you just curl the pod directly instead of going through the service endpoint? `oc exec -c elasticsearch $pod -- indices` or `oc exec -c elasticsearch $pod -- es_util --query=_cat/indices?v` Yes I can use internal curls for the most part but I have automation written for debugging and QoL purposes that relies on the service endpoint. Not to mention customers could see the same thing when trying to administer elasticsearch on their end. I will say today I was able to run a scale workload without hitting the exception afterwards, so reproducibility isn't 100%. Interestingly, the other day I hit this error, un-deployed logging, then re-deployed it and immediately started hitting this error again without running a workload so I wonder if there is some underlying issue. (In reply to Eric Matysek from comment #4) > I will say today I was able to run a scale workload without hitting the > exception afterwards, so reproducibility isn't 100%. > > Interestingly, the other day I hit this error, un-deployed logging, then > re-deployed it and immediately started hitting this error again without > running a workload so I wonder if there is some underlying issue. > bytes."}],"type":"too_long_frame_exception","reason":"HTTP header is larger > than 8192 bytes."},"status":400} 400 is a client error which means it's something on the caller side. Consider adding a verbose flag or something else to dump the headers. Is there maybe an issue with the size of the token you are sending? Have you considered jumping on the pod to run the curl to remove 'oc exec' from the equation? (In reply to Jeff Cantrill from comment #5) > > 400 is a client error which means it's something on the caller side. > Consider adding a verbose flag or something else to dump the headers. You can see I am only passing the Authorization header in my curl command. I am using the exact same curl command before and after the workload, it works before but not after. > > Is there maybe an issue with the size of the token you are sending? If this was the case I believe my curl before the workload would fail. > Have you considered jumping on the pod to run the curl to remove 'oc exec' > from the equation? The call works with oc exec curl hitting localhost:9200, you can see this in my "expected results" section, so I think that eliminates oc exec from the equation but I can give rsh a shot. This seems to suggest to me that something in between my curl command and the elasticsearch container is altering (adding to) my headers. I am also wondering why I see no exception in the elasticsearch pod logs Putting on Upcoming Sprint as ongoing investigation. (In reply to Eric Matysek from comment #6) > > This seems to suggest to me that something in between my curl command and > the elasticsearch container is altering (adding to) my headers. > I am also wondering why I see no exception in the elasticsearch pod logs 4.5 introduced the elasticsearch-proxy which fronts ES and sits between the service and the pod. Maybe there is something with how we are manipulating the request in the proxy which exhibits the behavior you see. Pushing target release as this is not a blocker for 4.6 Moving to UpcomingSprint for future evaluation Moving to UpcomingRelease Moving to 4.7 to satisfy CF requirements for 4.6 Marking UpcomingSprint as will not be merged or addressed by EOD Setting UpcomingSprint as unable to resolve before EOD Verified on Image: registry.ci.openshift.org/ocp/4.7:logging-elasticsearch6
Image ID: registry.ci.openshift.org/ocp/4.7@sha256:25fca4c45ca11e7dc3fd25f35eb5e5f7225ee87942f66b77e8a1ce2b8672dbc3
Image: registry.ci.openshift.org/ocp/4.7:elasticsearch-proxy
Image ID: registry.ci.openshift.org/ocp/4.7@sha256:0272be3313129cf243ab7ec884e91f3168c8ad0e01b357076ac75ecce52ba024
with 30M messages, curl to service IP worked correctly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652 |