Bug 1593297 - Fluent pipeline stuck because records in request do not equal response
Summary: Fluent pipeline stuck because records in request do not equal response
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.10.0
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: 1562004 1593310 1593312
TreeView+ depends on / blocked
 
Reported: 2018-06-20 13:39 UTC by Jeff Cantrill
Modified: 2018-12-20 21:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: fluent-plugin-elasticsearch improperly handled book keeping of the records being submitted Consequence: Fluent was stuck processing a chuck even though there was a valid request and response. Fix: Properly account for the records submitted to Elasticsearch. Result: The pipeline is no longer stuck
Clone Of:
: 1593310 1593312 (view as bug list)
Environment:
Last Closed: 2018-12-20 21:38:01 UTC
Target Upstream Version:


Attachments (Terms of Use)
The current fluentd logs (14.29 KB, text/plain)
2018-06-27 10:48 UTC, Anping Li
no flags Details


Links
System ID Priority Status Summary Last Updated
Github uken fluent-plugin-elasticsearch pull 437 'None' closed Fix #434 bulk count 2020-09-28 06:43:44 UTC
Red Hat Knowledge Base (Solution) 3597751 None None None 2018-08-29 22:47:11 UTC

Description Jeff Cantrill 2018-06-20 13:39:01 UTC
The query are still blocked in openshift3/logging-fluentd/images/v3.9.31-2.

Steps:
1. Put the attached buffer file under /var/lib/fluentd
2. oc set env ds/logging-fluentd ENABLE_UTF8_FILTER=true
   oc set env ds/logging-fluentd DEBUG=true VERBOSE=true DEBUG_TRANSPORTER=true TRANSPORTER_LOG_LEVEL=debug
   oc set env ds/logging-fluentd BUFFER_QUEUE_LIMIT=40  FILE_BUFFER_LIMIT=320Mi
3. watch the buffer query size
# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
36

4. watch the log
    
2018-06-19 15:30:32 +0800 [warn]: fluent/root_agent.rb:187:emit_error_event: dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/share/gems/gems/fluent-plugin-elasticsearch-1.16.1/lib/fluent/plugin/out_elasticsearch.rb:383:in `downcase'" tag="output_tag" time=1522079957 record={"systemd"=>{"t"=>{"BOOT_ID"=>"f4becb7fb2ae42b8abb3076654c926cc", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver json-file --log-opt max-size=50m --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.basesize=3G --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"d52c597d0f1a42aeb01b5a7d71e63f24", "PID"=>"2961", "SELINUX_CONTEXT"=>"system_u:system_r:init_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"dockerd-current"}}, "level"=>"info", "message"=>"time=\"2018-03-26T15:59:17.716724325Z\" level=error msg=\"Failed to log msg \\\"  2018-03-26 15:58:49 +0000 [warn]: suppressed same stacktrace\\\" for logger json-file: write /var/lib/docker/containers/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46-json.log: no space left on device\"", "hostname"=>"ip-172-31-31-141.us-west-1.compute.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.131.60.182", "ipaddr6"=>"fe80::c802:5fff:fe7a:629a", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-03-26T16:00:23.009320+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-03-26T15:59:17.000000+00:00", 90=>"\u0018\u05C8\xA7systemd\x82\xA1t\x8E\xA7BOOT_ID\xC4 "}
2018-06-19 15:30:34 +0800 [warn]: fluent/output.rb:381:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2018-06-19 15:35:31 +0800 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="The number of records submitted 6837 do not match the number returned 6835. Unable to process bulk response." plugin_id="object:3fdebe407fa4"
  2018-06-19 15:30:34 +0800 [warn]: suppressed same stacktrace
2018-06-19 15:30:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:31:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:31:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:32:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:32:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:33:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:33:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:34:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:34:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:35:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:35:32 +0800 [warn]: fluent/root_agent.rb:187:emit_error_event: dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/share/gems/gems/fluent-plugin-elasticsearch-1.16.1/lib/fluent/plugin/out_elasticsearch.rb:383:in `downcase'" tag="output_tag" time=1522079957 record={"systemd"=>{"t"=>{"BOOT_ID"=>"f4becb7fb2ae42b8abb3076654c926cc", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver json-file --log-opt max-size=50m --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.basesize=3G --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"d52c597d0f1a42aeb01b5a7d71e63f24", "PID"=>"2961", "SELINUX_CONTEXT"=>"system_u:system_r:init_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"dockerd-current"}}, "level"=>"info", "message"=>"time=\"2018-03-26T15:59:17.716724325Z\" level=error msg=\"Failed to log msg \\\"  2018-03-26 15:58:49 +0000 [warn]: suppressed same stacktrace\\\" for logger json-file: write /var/lib/docker/containers/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46-json.log: no space left on device\"", "hostname"=>"ip-172-31-31-141.us-west-1.compute.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.131.60.182", "ipaddr6"=>"fe80::c802:5fff:fe7a:629a", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-03-26T16:00:23.009320+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-03-26T15:59:17.000000+00:00", 90=>"\u0018\u05C8\xA7systemd\x82\xA1t\x8E\xA7BOOT_ID\xC4 "}
2018-06-19 15:35:34 +0800 [warn]: fluent/output.rb:381:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2018-06-19 15:40:31 +0800 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="The number of records submitted 6837 do not match the number returned 6835. Unable to process bulk response." plugin_id="object:3fdebe407fa4"
  2018-06-19 15:35:34 +0800 [warn]: suppressed same stacktrace
2018-06-19 15:35:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:36:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:36:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:37:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:37:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3
2018-06-19 15:38:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3

Comment 1 Jeff Cantrill 2018-06-20 14:12:15 UTC
requires an official image build with rubygem-fluent-plugin-elasticsearch-1.17.0-1.el7

Comment 4 Junqi Zhao 2018-06-26 09:58:42 UTC
@Jeff

Did not find the "attached buffer file", do we need to use this file for testing?

Comment 5 Jeff Cantrill 2018-06-27 02:26:11 UTC
The 'attached buffer file' references the bug from which the stack was copied to produce this bug.  It was for an error that in directly related to this one.  Reference the most recent 'utf-8' issue which I do not have at the moment.

Comment 6 Anping Li 2018-06-27 10:45:45 UTC
All buffer record are read without the no equal response. Moved to verified

root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
34
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
33
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
33
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
21
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
21
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
20
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
20
[root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l
1

Comment 7 Anping Li 2018-06-27 10:48:22 UTC
Created attachment 1454998 [details]
The current fluentd logs

Very small fluentd logs


Note You need to log in before you can comment on or make changes to this bug.