Bug 1593297
| Summary: | Fluent pipeline stuck because records in request do not equal response | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jeff Cantrill <jcantril> | ||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Anping Li <anli> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.10.0 | CC: | aos-bugs, jcantril, juzhao, rmeggins, xtian | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: fluent-plugin-elasticsearch improperly handled book keeping of the records being submitted
Consequence: Fluent was stuck processing a chuck even though there was a valid request and response.
Fix: Properly account for the records submitted to Elasticsearch.
Result: The pipeline is no longer stuck
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1593310 1593312 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-12-20 21:38:01 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1562004, 1593310, 1593312 | ||||||
| Attachments: |
|
||||||
requires an official image build with rubygem-fluent-plugin-elasticsearch-1.17.0-1.el7 @Jeff Did not find the "attached buffer file", do we need to use this file for testing? The 'attached buffer file' references the bug from which the stack was copied to produce this bug. It was for an error that in directly related to this one. Reference the most recent 'utf-8' issue which I do not have at the moment. All buffer record are read without the no equal response. Moved to verified root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 34 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 33 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 33 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 21 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 21 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 20 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 20 [root@ip-172-18-10-163 fluentd]# ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 1 Created attachment 1454998 [details]
The current fluentd logs
Very small fluentd logs
|
The query are still blocked in openshift3/logging-fluentd/images/v3.9.31-2. Steps: 1. Put the attached buffer file under /var/lib/fluentd 2. oc set env ds/logging-fluentd ENABLE_UTF8_FILTER=true oc set env ds/logging-fluentd DEBUG=true VERBOSE=true DEBUG_TRANSPORTER=true TRANSPORTER_LOG_LEVEL=debug oc set env ds/logging-fluentd BUFFER_QUEUE_LIMIT=40 FILE_BUFFER_LIMIT=320Mi 3. watch the buffer query size # ls /var/lib/fluentd/buffer-output-es-config.output_tag.* |wc -l 36 4. watch the log 2018-06-19 15:30:32 +0800 [warn]: fluent/root_agent.rb:187:emit_error_event: dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/share/gems/gems/fluent-plugin-elasticsearch-1.16.1/lib/fluent/plugin/out_elasticsearch.rb:383:in `downcase'" tag="output_tag" time=1522079957 record={"systemd"=>{"t"=>{"BOOT_ID"=>"f4becb7fb2ae42b8abb3076654c926cc", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver json-file --log-opt max-size=50m --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.basesize=3G --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"d52c597d0f1a42aeb01b5a7d71e63f24", "PID"=>"2961", "SELINUX_CONTEXT"=>"system_u:system_r:init_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"dockerd-current"}}, "level"=>"info", "message"=>"time=\"2018-03-26T15:59:17.716724325Z\" level=error msg=\"Failed to log msg \\\" 2018-03-26 15:58:49 +0000 [warn]: suppressed same stacktrace\\\" for logger json-file: write /var/lib/docker/containers/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46-json.log: no space left on device\"", "hostname"=>"ip-172-31-31-141.us-west-1.compute.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.131.60.182", "ipaddr6"=>"fe80::c802:5fff:fe7a:629a", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-03-26T16:00:23.009320+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-03-26T15:59:17.000000+00:00", 90=>"\u0018\u05C8\xA7systemd\x82\xA1t\x8E\xA7BOOT_ID\xC4 "} 2018-06-19 15:30:34 +0800 [warn]: fluent/output.rb:381:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2018-06-19 15:35:31 +0800 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="The number of records submitted 6837 do not match the number returned 6835. Unable to process bulk response." plugin_id="object:3fdebe407fa4" 2018-06-19 15:30:34 +0800 [warn]: suppressed same stacktrace 2018-06-19 15:30:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:31:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:31:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:32:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:32:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:33:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:33:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:34:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:34:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:35:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:35:32 +0800 [warn]: fluent/root_agent.rb:187:emit_error_event: dump an error event: error_class=ArgumentError error="invalid byte sequence in UTF-8" location="/usr/share/gems/gems/fluent-plugin-elasticsearch-1.16.1/lib/fluent/plugin/out_elasticsearch.rb:383:in `downcase'" tag="output_tag" time=1522079957 record={"systemd"=>{"t"=>{"BOOT_ID"=>"f4becb7fb2ae42b8abb3076654c926cc", "CAP_EFFECTIVE"=>"1fffffffff", "CMDLINE"=>"/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json --selinux-enabled --log-driver json-file --log-opt max-size=50m --signature-verification=False --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true --storage-opt dm.basesize=3G --mtu=8951 --add-registry registry.access.redhat.com --add-registry registry.access.redhat.com", "COMM"=>"dockerd-current", "EXE"=>"/usr/bin/dockerd-current", "GID"=>"0", "MACHINE_ID"=>"d52c597d0f1a42aeb01b5a7d71e63f24", "PID"=>"2961", "SELINUX_CONTEXT"=>"system_u:system_r:init_t:s0", "SYSTEMD_CGROUP"=>"/system.slice/docker.service", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"docker.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"dockerd-current"}}, "level"=>"info", "message"=>"time=\"2018-03-26T15:59:17.716724325Z\" level=error msg=\"Failed to log msg \\\" 2018-03-26 15:58:49 +0000 [warn]: suppressed same stacktrace\\\" for logger json-file: write /var/lib/docker/containers/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46/a3fca2c0a2f4f44321a7d8165b0cd70c041360ae31960e0959ac8cc7f25dae46-json.log: no space left on device\"", "hostname"=>"ip-172-31-31-141.us-west-1.compute.internal", "pipeline_metadata"=>{"collector"=>{"ipaddr4"=>"10.131.60.182", "ipaddr6"=>"fe80::c802:5fff:fe7a:629a", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2018-03-26T16:00:23.009320+00:00", "version"=>"0.12.42 1.6.0"}}, "@timestamp"=>"2018-03-26T15:59:17.000000+00:00", 90=>"\u0018\u05C8\xA7systemd\x82\xA1t\x8E\xA7BOOT_ID\xC4 "} 2018-06-19 15:35:34 +0800 [warn]: fluent/output.rb:381:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2018-06-19 15:40:31 +0800 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="The number of records submitted 6837 do not match the number returned 6835. Unable to process bulk response." plugin_id="object:3fdebe407fa4" 2018-06-19 15:35:34 +0800 [warn]: suppressed same stacktrace 2018-06-19 15:35:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:36:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:36:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:37:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:37:43 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3 2018-06-19 15:38:13 +0800 [info]: plugin/filter_kubernetes_metadata.rb:109:dump_stats: stats - namespace_cache_size: 2, pod_cache_size: 3, namespace_cache_api_updates: 3, pod_cache_api_updates: 3, id_cache_miss: 3