Bug 1858200

Summary: Unfuntional fluentd pods when configuring only the collector and configured to send to an external syslog
Product: OpenShift Container Platform Reporter: Oscar Casal Sanchez <ocasalsa>
Component: LoggingAssignee: Periklis Tsirakidis <periklis>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4CC: anli, aos-bugs, daniel.kucera, periklis
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1861013 (view as bug list) Environment:
Last Closed: 2020-10-27 16:15:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1861013    

Description Oscar Casal Sanchez 2020-07-17 08:06:51 UTC
[Description of problem]

Trying to deploy only the collector to send the logs to an external syslog, gets the fluentd pods unfuntional. 

To install the CLO was followed step by step the documentation here [1] 
to configure the syslog configmap was followed step by step the documentation here [2]


## Define the clusterLogging instance only with the collector
$ cat clo-instance.yaml 
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  collection:
    logs:
      type: "fluentd"
      fluentd: {}
$ oc create -f clo-instance.yaml 

## The fluentd pods are created
$ oc get pods
NAME                                        READY   STATUS    RESTARTS   AGE
cluster-logging-operator-847d45b4bb-4q4lx   1/1     Running   0          2m29s
fluentd-szr6z                               1/1     Running   0          2m16s
fluentd-xsbqd                               1/1     Running   0          2m16s

## Create syslog cm
$ cat syslog.yml 
kind: ConfigMap
apiVersion: v1
metadata:
  name: syslog
  namespace: openshift-logging
data:
  syslog.conf: |
    <store>
     @type syslog_buffered
     remote_syslog syslogserver.openshift-logging.svc.cluster.local
     port 514
     hostname ${hostname}
     remove_tag_prefix tag
     tag_key ident,systemd.u.SYSLOG_IDENTIFIER
     facility local0
     severity info
     use_record true
     payload_key message
    </store>
$ oc create -f syslog.yml

## List the configmaps
$ oc get cm
NAME                            DATA   AGE
cluster-logging-operator-lock   0      4m
fluentd                         3      3m51s
fluentd-trusted-ca-bundle       1      3m51s
syslog                          1      3s

At this point, two issues are happening,

ISSUE 1
#######

Trying to check the fluentd logs, it's possible to receive one error:

~~~
$ oc logs fluentd-szr6z 
expr: division by zero
run.sh: line 103: [: too many arguments
expr: syntax error
run.sh: line 108: [: too many arguments
~~~

Then, we'll try in a different way:

~~~
$ oc exec fluentd-szr6z -- logs
ls: cannot access /var/log/fluentd: No such file or directory
~~~

And if we verify if the /var/log/fluentd directory exists, it doesn't exist:

~~~
$ oc rsh fluentd-szr6z ls -ld /var/log/fluentd
ls: cannot access /var/log/fluentd: No such file or directory
command terminated with exit code 2
~~~

ISSUE 2
#######
If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty:

~~~
$ oc rsh fluentd-szr6z 
sh-4.2# cat /etc/fluent/fluent.conf 
sh-4.2# ls -ld /etc/fluent/fluent.conf 
lrwxrwxrwx. 1 root root 38 Jul  6 05:27 /etc/fluent/fluent.conf -> /etc/fluent/configs.d/user/fluent.conf
sh-4.2# ls -ld /etc/fluent/configs.d/user/fluent.conf 
lrwxrwxrwx. 1 root root 18 Jul 17 07:41 /etc/fluent/configs.d/user/fluent.conf -> ..data/fluent.conf
sh-4.2# ls -ld ..data/fluent.conf 
ls: cannot access ..data/fluent.conf: No such file or directory
~~~

Following the symlinks, the latest is to ..data/fluentd.conf and it's a symlink broken. Then, fluentd has an empty configuration file.



[Version-Release number of selected component (if applicable)]

$ oc version
Client Version: 4.4.12
Server Version: 4.4.12
Kubernetes Version: v1.17.1+a1af596

$ oc get csv -n openshift-logging
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.4.0-202007060343.p0           Cluster Logging          4.4.0-202007060343.p0              Succeeded
elasticsearch-operator.4.4.0-202007060343.p0   Elasticsearch Operator   4.4.0-202007060343.p0              Succeeded


[How reproducible]

Always

[Steps to Reproduce]

Indicated in the description


[Additional info]
I'll check if the same happens using other configurations, not only with syslog


[1] https://docs.openshift.com/container-platform/4.4/logging/cluster-logging-deploying.html
[2] https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-external.html#cluster-logging-collector-syslog_cluster-logging-external

Comment 6 Brett Jones 2020-07-31 19:19:24 UTC
*** Bug 1852341 has been marked as a duplicate of this bug. ***

Comment 7 Anping Li 2020-08-03 11:54:23 UTC
Hit error below:
</filter> is not used.
2020-08-03 11:49:31 +0000 [warn]: got unrecoverable error in primary and no secondary error_class=ArgumentError error="'Metadata' is not a designated severity"
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/syslog_protocol-0.9.2/lib/syslog_protocol/packet.rb:100:in `severity='
  2020-08-03 11:49:31 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:104:in `send_to_syslog'
  2020-08-03 11:49:31 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:90:in `block in write'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/event.rb:327:in `each'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/event.rb:327:in `block in each'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/buffer/memory_chunk.rb:81:in `open'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/buffer/memory_chunk.rb:81:in `open'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/event.rb:326:in `each'
  2020-08-03 11:49:31 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:89:in `write'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/compat/output.rb:131:in `write'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
  2020-08-03 11:49:31 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-08-03 11:49:31 +0000 [warn]: bad chunk is moved to /tmp/fluent/backup/worker0/object_3f8c0f1a98e0/5abf7b7bd476f90e912dc3d80aeead12.log
2020-08-03 11:50:31 +0000 [warn]: got unrecoverable error in primary and no secondary error_class=ArgumentError error="'Metadata' is not a designated severity"
  2020-08-03 11:50:31 +0000 [warn]: suppressed same stacktrace
2020-08-03 11:50:31 +0000 [warn]: bad chunk is moved to /tmp/fluent/backup/worker0/object_3f8c0f1a98e0/5abf7bb51d33b2e9e08ef6d7bea8ffc5.log


#oc get cm syslog -o yaml
apiVersion: v1
data:
  syslog.conf: |
    <store>
         @type syslog_buffered
         remote_syslog rsyslogserver.openshift-logging.svc.cluster.local
         port 514
         hostname ${hostname}
         remove_tag_prefix tag
         tag_key ident,systemd.u.SYSLOG_IDENTIFIER
         facility local0
         severity Informational
         use_record true
         payload_key message
    </store>
kind: ConfigMap

Comment 8 Periklis Tsirakidis 2020-08-03 12:05:35 UTC
@Anping

This is not an issue related to this BZ, but more of the sort of how a legacy syslog config map is provided for a user. We should handle this as a separate issue in [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1852341

Comment 9 Anping Li 2020-08-03 12:32:42 UTC
Periklis,
The bug had duplicated to this one.  I think we should fix it here .

Comment 10 Anping Li 2020-08-03 12:36:27 UTC
For the issue in Comment 7, no logs can be received in the Syslog server.

Comment 11 Periklis Tsirakidis 2020-08-03 12:39:59 UTC
@Anping

Yes [1] expected that this BZ will fix, but it does not. The issue we try to fix here to deploy legacy methods independent of LF and the PR provided does this. We should reopen [1] and enable further investigation. Let's not block the current backport chain for a way different issue. These two issues are imho independent. 

IMHO the `Metadata' is not a designated severity` issue reflects the fact that the third-party provided syslog configmap is currently a copy from the docs and maybe needs some update love.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1852341

Comment 12 Anping Li 2020-08-04 12:25:33 UTC
Ok, Move to verified to unblock backport.

Comment 14 errata-xmlrpc 2020-10-27 16:15:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196