Bug 1593312 - Fluent pipeline stuck because records in request do not equal response
Summary: Fluent pipeline stuck because records in request do not equal response
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.7.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.7.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1593297
Blocks: 1562004 1593310
TreeView+ depends on / blocked
 
Reported: 2018-06-20 14:16 UTC by Jeff Cantrill
Modified: 2020-01-31 18:57 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: fluent-plugin-elasticsearch improperly handled book keeping of the records being submitted Consequence: Fluent was stuck processing a chuck even though there was a valid request and response. Fix: Properly account for the records submitted to Elasticsearch. Result: The pipeline is no longer stuck
Clone Of: 1593297
Environment:
Last Closed: 2018-08-09 22:14:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github uken fluent-plugin-elasticsearch pull 437 None closed Fix #434 bulk count 2020-09-28 06:41:04 UTC
Red Hat Product Errata RHBA-2018:2337 None None None 2018-08-09 22:15:05 UTC

Description Jeff Cantrill 2018-06-20 14:16:22 UTC
+++ This bug was initially created as a clone of Bug #1593297 +++

The query are still blocked in openshift3/logging-fluentd/images/v3.9.31-2.

Steps:
1. Put the attached buffer file under /var/lib/fluentd
2. oc set env ds/logging-fluentd ENABLE_UTF8_FILTER=true
   oc set env ds/logging-fluentd DEBUG=true VERBOSE=true DEBUG_TRANSPORTER=true TRANSPORTER_LOG_LEVEL=debug
   oc set env ds/logging-fluentd BUFFER_QUEUE_LIMIT=40  FILE_BUFFER_LIMIT=320Mi
3. watch the buffer query size
# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
36

4. watch the log
    
2018-06-19 15:30:32 +0800 [warn]: fluent/root_agent.rb:187:emit_error_event: dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/share/gems/gems/fluent-plugin-elasticsearch-1.16.1/lib/fluent/plugin/out_elasticsearch.rb:383:in `downcase'" tag="output_tag" time=1522079957 record={"systemd"=>{"t"=>{"BOOT_ID"=>"f4becb7fb2ae42b8abb3076654c926cc", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver json-file --log-opt max-size=50m --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.basesize=3G --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"d52c597d0f1a42aeb01b5a7d71e63f24", "PID"=>"2961", "SELINUX_CONTEXT"=>"system_u:system_r:init_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"dockerd-current"}}, "level"=>"info", "message"=>"time=\"2018-03-26T15:59:17.716724325Z\" level=error msg=\"Failed to log msg \\\"  2018-03-26 15:58:49 +0000 [warn]: suppressed same stacktrace\\\" for logger json-file: write /var/lib/docker/containers/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46-json.log: no space left on device\"", "hostname"=>"ip-172-31-31-141.us-west-1.compute.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.131.60.182", "ipaddr6"=>"fe80::c802:5fff:fe7a:629a", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-03-26T16:00:23.009320+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-03-26T15:59:17.000000+00:00", 90=>"\u0018\u05C8\xA7systemd\x82\xA1t\x8E\xA7BOOT_ID\xC4 "}
2018-06-19 15:30:34 +0800 [warn]: fluent/output.rb:381:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2018-06-19 15:35:31 +0800 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="The number of records submitted 6837 do not match the number returned 6835. Unable to process bulk response." plugin_id="object:3fdebe407fa4"
  2018-06-19 15:30:34 +0800 [warn]: suppressed same stacktrace
2018-06-19 15:30:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:31:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:31:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:32:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:32:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:33:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:33:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:34:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:34:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:35:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:35:32 +0800 [warn]: fluent/root_agent.rb:187:emit_error_event: dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/share/gems/gems/fluent-plugin-elasticsearch-1.16.1/lib/fluent/plugin/out_elasticsearch.rb:383:in `downcase'" tag="output_tag" time=1522079957 record={"systemd"=>{"t"=>{"BOOT_ID"=>"f4becb7fb2ae42b8abb3076654c926cc", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver json-file --log-opt max-size=50m --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.basesize=3G --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"d52c597d0f1a42aeb01b5a7d71e63f24", "PID"=>"2961", "SELINUX_CONTEXT"=>"system_u:system_r:init_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"dockerd-current"}}, "level"=>"info", "message"=>"time=\"2018-03-26T15:59:17.716724325Z\" level=error msg=\"Failed to log msg \\\"  2018-03-26 15:58:49 +0000 [warn]: suppressed same stacktrace\\\" for logger json-file: write /var/lib/docker/containers/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46-json.log: no space left on device\"", "hostname"=>"ip-172-31-31-141.us-west-1.compute.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.131.60.182", "ipaddr6"=>"fe80::c802:5fff:fe7a:629a", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-03-26T16:00:23.009320+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-03-26T15:59:17.000000+00:00", 90=>"\u0018\u05C8\xA7systemd\x82\xA1t\x8E\xA7BOOT_ID\xC4 "}
2018-06-19 15:35:34 +0800 [warn]: fluent/output.rb:381:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2018-06-19 15:40:31 +0800 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="The number of records submitted 6837 do not match the number returned 6835. Unable to process bulk response." plugin_id="object:3fdebe407fa4"
  2018-06-19 15:35:34 +0800 [warn]: suppressed same stacktrace
2018-06-19 15:35:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:36:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:36:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:37:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:37:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:38:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3

--- Additional comment from Jeff Cantrill on 2018-06-20 10:12:15 EDT ---

requires an official image build with rubygem-fluent-plugin-elasticsearch-1.17.0-1.el7

Comment 2 Anping Li 2018-06-28 07:48:44 UTC
Waiting the image include the fix.

Comment 6 Junqi Zhao 2018-07-16 05:48:21 UTC
Please change to ON_QA, issue is fixed.

Used buffer file from https://bugzilla.redhat.com/attachment.cgi?id=1417859
and followed steps in Comment 0, all buffer record are read without the no equal response error in fluentd pods logs.

fluentd image version: logging-fluentd-v3.7.58-1

Comment 7 Junqi Zhao 2018-07-16 05:52:47 UTC
# docker run -it brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.7.58-1 rpm -qa | grep rubygem-fluent-plugin-elasticsearch
rubygem-fluent-plugin-elasticsearch-1.17.0-1.el7.noarch

Comment 11 Junqi Zhao 2018-07-18 02:02:39 UTC
Change back to MOFIFIED, the latest logging-fluentd v3.7.59.2 is not built with rubygem-fluent-plugin-elasticsearch-1.17.0-1.el7.noarch.rpm, but with rubygem-fluent-plugin-elasticsearch-1.15.2-1.el7.noarch. 

need correct fluentd image

# docker run -it brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.7.59.2 rpm -qa | grep rubygem-fluent-plugin-elasticsearch
rubygem-fluent-plugin-elasticsearch-1.15.2-1.el7.noarch


REPOSITORY                                                                        TAG                 IMAGE ID            CREATED             SIZE
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd   v3.7                268fc822a76a        10 hours ago        286.1 MB
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd   v3.7.59-2           268fc822a76a        10 hours ago        286.1 MB

Comment 12 Junqi Zhao 2018-07-18 02:04:54 UTC
(In reply to Junqi Zhao from comment #11)
> Change back to MOFIFIED, the latest logging-fluentd v3.7.59.2 is not built

Correction:

the latest logging-fluentd v3.7.59-2

Comment 14 Anping Li 2018-07-19 02:49:25 UTC
The fix is in 3.7.58. But the errata will release v3.7.57. so move bug to modified again.

Comment 17 Qiaoling Tang 2018-08-02 05:21:25 UTC
verified on v3.7.61

Comment 19 errata-xmlrpc 2018-08-09 22:14:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2337


Note You need to log in before you can comment on or make changes to this bug.