"JSON payload processing of the log message payload if abused can cause logging to slow to a crawl" The problem stems from a feature of the k8s metadata fluentd filter [0] which will look for a JSON payload in the message field and load the fields found in the JSON document as fields of the log entry itself. If the JSON payload is not well formed, where each log message can contribute a unique field name, Elasticsearch spends all of its time in "cluster state transitions" while it propagates the new files to all the members of the cluster tracking that index. You can see this with INFO messages in the logs like, "". We should either turn this feature off by default, or engineer a way to ensure the gratuitous field generation offered by Elasticsearch does not result unique fields being generated. [0] See "merge_json_log" description at https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter
Workaround: 1) edit the fluentd configmap e.g. oc edit cm/logging-fluentd 2) look for this line: @include configs.d/openshift/filter-k8s-meta.conf 3) replace it with this - be sure to preserve the indentation: <filter kubernetes.**> type kubernetes_metadata merge_json_log false kubernetes_url "#{ENV['K8S_HOST_URL']}" cache_size "#{ENV['K8S_METADATA_CACHE_SIZE'] || '1000'}" watch "#{ENV['K8S_METADATA_WATCH'] || 'false'}" bearer_token_file /var/run/secrets/kubernetes.io/serviceaccount/token ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt include_namespace_metadata true use_journal "#{ENV['USE_JOURNAL'] || 'false'}" container_name_to_kubernetes_regexp '^(?<name_prefix>[^_]+)_(?<container_name>[^\._]+)(\.(?<container_hash>[^_]+))?_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[^_]+$' </filter> That is, add `merge_json_log false` to the 3.6 kubernetes_metadata filter configuration. This is from https://github.com/openshift/origin-aggregated-logging/blob/release-3.6/fluentd/configs.d/openshift/filter-k8s-meta.conf - I'm assuming this is 3.6 because the bug was filed against version 3.6.0
sorry, one more step - restart fluentd 4) oc delete pod -l component=fluentd or scale up and scale down oc label node -l logging-infra-fluentd=true --overwrite logging-infra-fluentd=false then wait for all fluentd pods to terminate then oc label node -l logging-infra-fluentd=false --overwrite logging-infra-fluentd=true
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/8be71b5f3a5bb7f7d99d43309fdfb7aaab884e22 bug 1569825. Deprecate merge_json_payload https://github.com/openshift/origin-aggregated-logging/commit/dec1ed51474f1db4ad5dee2f3d27181660dba26d Merge pull request #1109 from jcantrill/1569825_disable_json_parsing bug 1569825. Turn off JSON parsing by default
3.9 pr https://github.com/openshift/origin-aggregated-logging/pull/1131
This change provides the ability to disable JSON parsing by setting an environment variable. The default is to remain on in order to avoid surprising consumers who depend on the functionality.
The json playload was closed by default logging-fluentd/images/v3.9.30-2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1796
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/4481abeb2c3faedd363258ca0db199993bb7b091 bug 1569825. Deprecate merge_json_payload https://github.com/openshift/origin-aggregated-logging/commit/b6147ebc993083f1982872b702be3d465d40898e Merge pull request #1132 from openshift-cherrypick-robot/cherry-pick-1109-to-es5.x [es5.x] bug 1569825. Turn off JSON parsing by default