Bug 1669223
| Summary: | How to get information on root cause of "400 - Rejected by Elasticsearch" errors | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pedro Amoedo <pamoedom> | ||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Anping Li <anli> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.7.1 | CC: | aos-bugs, pamoedom, rmeggins | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.7.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-01-25 16:24:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Pedro Amoedo
2019-01-24 16:13:15 UTC
This is most likely caused because merging of json logs is enabled by default [1]. In our pending 4.0 release, we disabled this feature by default because of various issues of which this falls into. The problem is your applications are likely logging a JSON message payload that is being added to the payload fluentd submits to Elasticsearch. The addition of your application fields to fluent's payload exceeds the maximum allowing fields and ES rejects the messages. The only resolution is to disable this feature so that when new indices are created and logs are added to these indices that you will not exceed this threshold.
We expose disabling via ENV var in later releases but for 3.7:
1. Create a configmap consisting of these files (named fluentd-overrides ?) [2]
2. Edit [3] to include 'merge_json_log false'
3. Edit the DaemonSet to mount the configmap
add section to 'volumes':
- name: config-overrides
configMap:
name: fluentd-overrides
add section to 'volumeMounts':
- name: config-overrides
mountPath: /etc/fluent/configs.d/openshift
readOnly: true
Note: The changes to the DaemonSet will need to be reapplied after any upgrades
[1] https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/v1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb#L51
[2] https://github.com/openshift/origin-aggregated-logging/tree/release-3.7/fluentd/configs.d/openshift
[3] https://github.com/openshift/origin-aggregated-logging/blob/release-3.7/fluentd/configs.d/openshift/filter-k8s-meta.conf
We recommend upgrading to 3.11 which is the long term support release. Closing this issue as WONTFIX
(In reply to Jeff Cantrill from comment #3) > This is most likely caused because merging of json logs is enabled by > default [1]. In our pending 4.0 release, we disabled this feature by > default because of various issues of which this falls into. The problem is > your applications are likely logging a JSON message payload that is being > added to the payload fluentd submits to Elasticsearch. The addition of your > application fields to fluent's payload exceeds the maximum allowing fields > and ES rejects the messages. The only resolution is to disable this feature > so that when new indices are created and logs are added to these indices > that you will not exceed this threshold. > > We expose disabling via ENV var in later releases but for 3.7: > > 1. Create a configmap consisting of these files (named fluentd-overrides ?) > [2] > 2. Edit [3] to include 'merge_json_log false' > 3. Edit the DaemonSet to mount the configmap > > add section to 'volumes': > > - name: config-overrides > configMap: > name: fluentd-overrides > > add section to 'volumeMounts': > - name: config-overrides > mountPath: /etc/fluent/configs.d/openshift > readOnly: true > > Note: The changes to the DaemonSet will need to be reapplied after any > upgrades > > [1] > https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/ > v1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb#L51 > [2] > https://github.com/openshift/origin-aggregated-logging/tree/release-3.7/ > fluentd/configs.d/openshift > [3] > https://github.com/openshift/origin-aggregated-logging/blob/release-3.7/ > fluentd/configs.d/openshift/filter-k8s-meta.conf > > We recommend upgrading to 3.11 which is the long term support release. > Closing this issue as WONTFIX Thanks Jeff, I will pass those instructions to the customer to verify the solution. (In reply to Pedro Amoedo from comment #5) > (In reply to Jeff Cantrill from comment #3) > > This is most likely caused because merging of json logs is enabled by > > default [1]. In our pending 4.0 release, we disabled this feature by > > default because of various issues of which this falls into. The problem is > > your applications are likely logging a JSON message payload that is being > > added to the payload fluentd submits to Elasticsearch. The addition of your > > application fields to fluent's payload exceeds the maximum allowing fields > > and ES rejects the messages. The only resolution is to disable this feature > > so that when new indices are created and logs are added to these indices > > that you will not exceed this threshold. > > > > We expose disabling via ENV var in later releases but for 3.7: > > > > 1. Create a configmap consisting of these files (named fluentd-overrides ?) > > [2] > > 2. Edit [3] to include 'merge_json_log false' > > 3. Edit the DaemonSet to mount the configmap > > > > add section to 'volumes': > > > > - name: config-overrides > > configMap: > > name: fluentd-overrides > > > > add section to 'volumeMounts': > > - name: config-overrides > > mountPath: /etc/fluent/configs.d/openshift > > readOnly: true > > > > Note: The changes to the DaemonSet will need to be reapplied after any > > upgrades > > > > [1] > > https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/ > > v1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb#L51 > > [2] > > https://github.com/openshift/origin-aggregated-logging/tree/release-3.7/ > > fluentd/configs.d/openshift > > [3] > > https://github.com/openshift/origin-aggregated-logging/blob/release-3.7/ > > fluentd/configs.d/openshift/filter-k8s-meta.conf > > > > We recommend upgrading to 3.11 which is the long term support release. > > Closing this issue as WONTFIX > > Thanks Jeff, I will pass those instructions to the customer to verify the > solution. Hi Jeff, the customer has confirmed the workaround, but they are now suffering another problem with indexing fields when deactivating the json merging, here you have the exact words from them: --------------- Thanks for this information and feedback. Change has been applied on the different clusters. Everything seems to be ok since 3 hours and we don't see the "400 - Rejected by Elasticsearch" error anymore. The root cause seems to be found. But, there is a drawback of the "merge json logs" desactivation. From what we have seen, there are no more indexes and fields in Kibana based on application data. Let me explain by an example. The applications are filling the fielf message: with this data (it's an example to illustrate): {"caller":"casaPartnerDAO.go:167","component":"CASA_SENDER","country":"CI","dateTime":"2019-01-28T09:17:04.219272Z","level":"info","msg":"receive error","requestId":"0aae2aee-7e90-489d-9e66-b1d8ad67e824","sessionId":"3d92fcd6-9be5-4605-9289-7ed051b20aa9","subject":"transfers.3d92fcd6-9be5-4605-9289-7ed051b20aa9.error","useCase":"ASyncIRTConfirmRequest","version":"1.5.1-RC10"}. When "merge json log" is active: Application can filter and perform search on Kibana based on sessionId or any field described inside message. When "merge json log" is not active: we can only see kubernetes fields on Kibana to look for an information. So, sorry, for insisting, but this is quit difficult now for users to perform queries based only on Kubernetes field. Is it possible to extend maximum allowing fields in Elastic Search rather than desactivating this feature? --------------- Any suggestion/workaround about this? Thanks in advance. This [1] is the reason we recommend disabling. Your are correct in it removes the possibility to simplify queries for user applications but there are the trade-offs described. There is no alternative at the moment but we plan to renable in a future release once we can address the issues identified. [1] https://github.com/openshift/origin-aggregated-logging/issues/1492 > So, sorry, for insisting, but this is quit difficult now for users to perform queries based only on Kubernetes field.
Is it possible to extend maximum allowing fields in Elastic Search rather than desactivating this feature?
I don't think it is just that there is a maximum number of fields, it is that we have no way to control the format of the fields coming out of the application. That is, elasticsearch does not like it if you write a field named "mydata" like this:
{..., "mydata":"some string value", ....}
then later write "mydata" with a different value like
{..., "mydata":42.5, ....}
then later
{..., "mydata":{"anotherfield":"some string value"}, ....}
Elasticsearch treats this as a "schema" violation, and returns error 400.
I think this is the source of the error 400 reported by fluentd but we won't know for sure until we get the corroborating fluentd and elasticsearch logs.
(In reply to Rich Megginson from comment #10) > > So, sorry, for insisting, but this is quit difficult now for users to perform queries based only on Kubernetes field. > Is it possible to extend maximum allowing fields in Elastic Search rather > than desactivating this feature? > > I don't think it is just that there is a maximum number of fields, it is > that we have no way to control the format of the fields coming out of the > application. That is, elasticsearch does not like it if you write a field > named "mydata" like this: > > {..., "mydata":"some string value", ....} > then later write "mydata" with a different value like > {..., "mydata":42.5, ....} > then later > {..., "mydata":{"anotherfield":"some string value"}, ....} > > Elasticsearch treats this as a "schema" violation, and returns error 400. > > I think this is the source of the error 400 reported by fluentd but we won't > know for sure until we get the corroborating fluentd and elasticsearch logs. Thanks for the graphical explanation Rich, PFA the requested logs. |