Bug 1753936

Summary: fluentd couldn't forward logs to rsyslog out:syslog: invalid facility value DEVICE
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, calfonso, rmeggins
Target Milestone: ---Keywords: Regression
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The data in the journald for facility values is not sanitized, and strange values get in. But this is ok - the plugin will just ignore the strange value and use the configured default value. Consequence: fluentd emits error messages at the wrong level, which cause confusion for customers. Fix: Make fluentd log at the proper log level, debug. Result: fluentd does not complain about invalid facility values.
Story Points: ---
Clone Of:
: 1756072 (view as bug list) Environment:
Last Closed: 2020-01-23 11:06:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1756072    

Description Anping Li 2019-09-20 10:26:12 UTC
Description of problem:



Version-Release number of selected component (if applicable):
v4.1.17-201909171057-ose-logging-fluentd

How reproducible:
Always

Steps to Reproduce:
1. Set clusterloging managementState: Unmanaged
2. Set Env to send logs to remote rsyslog
3. Check the fluentd logs

Actual results:
route to host - connect(2) for 172.30.230.143:9200 (Errno::EHOSTUNREACH)"
  2019-09-20 09:52:29 +0000 [warn]: suppressed same stacktrace
2019-09-20 09:53:35 +0000 [warn]: [elasticsearch-apps] retry succeeded. chunk_id="592f902cdca8efe5bcc99fc24b39d0b8"
2019-09-20 10:01:54 +0000 [warn]: [elasticsearch-apps] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=107.87953163499697 slow_flush_log_threshold=20.0 plugin_id="elasticsearch-apps"
2019-09-20 10:01:54 +0000 [warn]: [elasticsearch-apps] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=107.8780552820026 slow_flush_log_threshold=20.0 plugin_id="elasticsearch-apps"
2019-09-20 10:02:42 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:02:42 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:02:42 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:08:06 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:08:06 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:08:06 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:12:56 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:12:56 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:12:56 +0000 [warn]: out:syslog: invalid facility value DEVICE; reset to default local0
2019-09-20 10:18:40 +0000 [warn]: out:syslog: connection error by 172.30.19.219:514 :Broken pipe
2019-09-20 10:18:40 +0000 [warn]: failed to flush the buffer. retry_time=0 next_retry_seconds=2019-09-20 10:18:41 +0000 chunk="592f95447e27a4dfaf485dff1144f968" error_class=Errno::EPIPE error="Broken pipe"
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:176:in `write'
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:176:in `send_to_syslog'
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:90:in `block in write'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/event.rb:323:in `each'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/event.rb:323:in `block in each'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/buffer/memory_chunk.rb:80:in `open'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/event.rb:322:in `each'
  2019-09-20 10:18:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:89:in `write'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/compat/output.rb:131:in `write'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
  2019-09-20 10:18:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.5.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Expected results:


Additional info:

Comment 1 Rich Megginson 2019-09-23 20:52:22 UTC
I don't think this is a regression, or at least, it isn't a regression in our code.  Apparently, it is quite common to get non-integer values for the SYSLOG_FACILITY field - on Fedora 30:

>sudo journalctl -F SYSLOG_FACILITY | sort
0
1
10
24
3
4
5
9
AGENTS
AUDIT
BRIDGE
BT
CONCHECK
CORE
DEVICE
etc.

You can see that the troublesome DEVICE is in the list.

I think the fix for this will be to change the error log level from `warn` to `debug`

Comment 2 Rich Megginson 2019-09-23 20:53:42 UTC
I'm going to target this bz to 4.3, and clone for 4.2, 4.1, and 3.11

Comment 3 Rich Megginson 2019-09-23 22:26:06 UTC
@anli - can you confirm that forward isn't working to send logs to rsyslog?  If it isn't, then I think this is unrelated to warning message:

https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/out_syslog.rb#L88

              begin
                @packet.facility = Integer(record['systemd']['u']['SYSLOG_FACILITY'])
              rescue
                log.warn "out:syslog: invalid facility value #{record['systemd']['u']['SYSLOG_FACILITY']}; reset to default #{@facility}"
                @packet.facility = @facility
              end

That is, it will issue the message, then continue with the default i.e. it should still be forwarding logs, despite the error message.

Comment 4 Anping Li 2019-09-27 01:17:56 UTC
Yes, It is not related to the message. the message can be sent to rsyslog in 4.1.18 and in 4.2. How could I deal with the bug in 4.2 and 4.3?

Comment 5 Rich Megginson 2019-09-27 02:42:46 UTC
(In reply to Anping Li from comment #4)
> Yes, It is not related to the message. the message can be sent to rsyslog in
> 4.1.18 and in 4.2. How could I deal with the bug in 4.2 and 4.3?

I intend to clone/backport this fix to 4.2, 4.1, and 3.11.  I've already cloned this bz to 4.2: https://bugzilla.redhat.com/show_bug.cgi?id=1756072

Once this https://bugzilla.redhat.com/show_bug.cgi?id=1753936 bz is verified, I can merge the 4.2 fix, which will move https://bugzilla.redhat.com/show_bug.cgi?id=1756072 to MODIFIED, and then I will backport the fix to 4.1 and clone the bz for 4.1, etc.

Comment 8 Anping Li 2019-11-07 09:16:12 UTC
Verified in openshift/ose-logging-fluentd@sha256:d5852e035184bf7f12e26d7fbd6340737bdabd54bc6d34af8375aabeaa50e3d2

Comment 10 errata-xmlrpc 2020-01-23 11:06:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062