Bug 1756072 - fluentd couldn't forward logs to rsyslog out:syslog: invalid facility value DEVICE
Summary: fluentd couldn't forward logs to rsyslog out:syslog: invalid facility value D...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.z
Assignee: Rich Megginson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1753936
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-26 17:13 UTC by Rich Megginson
Modified: 2020-05-13 11:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The data in the journald for facility values is not sanitized, and strange values get in. But this is ok - the plugin will just ignore the strange value and use the configured default value. Consequence: fluentd emits error messages at the wrong level, which cause confusion for customers. Fix: Make fluentd log at the proper log level, debug. Result: fluentd does not complain about invalid facility values.
Clone Of: 1753936
Environment:
Last Closed: 2020-05-13 11:07:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 1759 0 None closed Bug 1756072: fluentd couldn't forward logs to rsyslog out:syslog: invalid facility value DEVICE 2020-10-06 18:01:24 UTC
Red Hat Product Errata RHBA-2020:2023 0 None None None 2020-05-13 11:07:37 UTC

Description Rich Megginson 2019-09-26 17:13:40 UTC
+++ This bug was initially created as a clone of Bug #1753936 +++

Description of problem:



Version-Release number of selected component (if applicable):
v4.1.17-201909171057-ose-logging-fluentd

How reproducible:
Always

Steps to Reproduce:
1. Set clusterloging managementState: Unmanaged
2. Set Env to send logs to remote rsyslog
3. Check the fluentd logs

Actual results:
route to host - connect(2) for 172.30.230.143:9200 (Errno::EHOSTUNREACH)"
  2019-09-20 09:52:29 +0000 [warn]: suppressed same stacktrace
2019-09-20 09:53:35 +0000 [warn]: [elasticsearch-apps] retry succeeded. chunk_id="592f902cdca8efe5bcc99fc24b39d0b8"
2019-09-20 10:01:54 +0000 [warn]: [elasticsearch-apps] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=107.87953163499697 slow_flush_log_threshold=20.0 plugin_id="elasticsearch-apps"
2019-09-20 10:01:54 +0000 [warn]: [elasticsearch-apps] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=107.8780552820026 slow_flush_log_threshold=20.0 plugin_id="elasticsearch-apps"
2019-09-20 10:02:42 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:02:42 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:02:42 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:08:06 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:08:06 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:08:06 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:12:56 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:12:56 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:12:56 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:18:40 +0000 [warn]: out:syslog: connection error by 172.30.19.219:514 :Broken pipe
2019-09-20 10:18:40 +0000 [warn]: failed to flush the buffer. retry_time=0 next_retry_seconds=2019-09-20 10:18:41 +0000 chunk="592f95447e27a4dfaf485dff1144f968" error_class=Errno::EPIPE error="Broken pipe"
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:176:in `write'
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:176:in `send_to_syslog'
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:90:in `block in write'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/event.rb:323:in `each'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/event.rb:323:in `block in each'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/event.rb:322:in `each'
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:89:in `write'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/compat/output.rb:131:in `write'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Expected results:


Additional info:

--- Additional comment from Rich Megginson on 2019-09-23 20:52:22 UTC ---

I don't think this is a regression, or at least, it isn't a regression in our code.  Apparently, it is quite common to get non-integer values for the SYSLOG_FACILITY field - on Fedora 30:

>sudo journalctl -F SYSLOG_FACILITY | sort
0
1
10
24
3
4
5
9
AGENTS
AUDIT
BRIDGE
BT
CONCHECK
CORE
DEVICE
etc.

You can see that the troublesome DEVICE is in the list.

I think the fix for this will be to change the error log level from `warn` to `debug`

--- Additional comment from Rich Megginson on 2019-09-23 20:53:42 UTC ---

I'm going to target this bz to 4.3, and clone for 4.2, 4.1, and 3.11

--- Additional comment from Rich Megginson on 2019-09-23 22:26:06 UTC ---

@anli - can you confirm that forward isn't working to send logs to rsyslog?  If it isn't, then I think this is unrelated to warning message:

https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/out_syslog.rb#L88

              begin
                @packet.facility = Integer(record['systemd']['u']['SYSLOG_FACILITY'])
              rescue
                log.warn "out:syslog: invalid facility value #{record['systemd']['u']['SYSLOG_FACILITY']}; reset to default #{@facility}"
                @packet.facility = @facility
              end

That is, it will issue the message, then continue with the default i.e. it should still be forwarding logs, despite the error message.

Comment 6 Anping Li 2020-05-06 12:51:02 UTC
Verified in clusterlogging.4.2.32-202005050921

Comment 8 errata-xmlrpc 2020-05-13 11:07:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2023


Note You need to log in before you can comment on or make changes to this bug.