This bug was initially created as a copy of Bug #1858200 I am copying this bug because: [Description of problem] Trying to deploy only the collector to send the logs to an external syslog, gets the fluentd pods unfuntional. To install the CLO was followed step by step the documentation here [1] to configure the syslog configmap was followed step by step the documentation here [2] ## Define the clusterLogging instance only with the collector $ cat clo-instance.yaml apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" collection: logs: type: "fluentd" fluentd: {} $ oc create -f clo-instance.yaml ## The fluentd pods are created $ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-847d45b4bb-4q4lx 1/1 Running 0 2m29s fluentd-szr6z 1/1 Running 0 2m16s fluentd-xsbqd 1/1 Running 0 2m16s ## Create syslog cm $ cat syslog.yml kind: ConfigMap apiVersion: v1 metadata: name: syslog namespace: openshift-logging data: syslog.conf: | <store> @type syslog_buffered remote_syslog syslogserver.openshift-logging.svc.cluster.local port 514 hostname ${hostname} remove_tag_prefix tag tag_key ident,systemd.u.SYSLOG_IDENTIFIER facility local0 severity info use_record true payload_key message </store> $ oc create -f syslog.yml ## List the configmaps $ oc get cm NAME DATA AGE cluster-logging-operator-lock 0 4m fluentd 3 3m51s fluentd-trusted-ca-bundle 1 3m51s syslog 1 3s At this point, two issues are happening, ISSUE 1 ####### Trying to check the fluentd logs, it's possible to receive one error: ~~~ $ oc logs fluentd-szr6z expr: division by zero run.sh: line 103: [: too many arguments expr: syntax error run.sh: line 108: [: too many arguments ~~~ Then, we'll try in a different way: ~~~ $ oc exec fluentd-szr6z -- logs ls: cannot access /var/log/fluentd: No such file or directory ~~~ And if we verify if the /var/log/fluentd directory exists, it doesn't exist: ~~~ $ oc rsh fluentd-szr6z ls -ld /var/log/fluentd ls: cannot access /var/log/fluentd: No such file or directory command terminated with exit code 2 ~~~ ISSUE 2 ####### If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty: ~~~ $ oc rsh fluentd-szr6z sh-4.2# cat /etc/fluent/fluent.conf sh-4.2# ls -ld /etc/fluent/fluent.conf lrwxrwxrwx. 1 root root 38 Jul 6 05:27 /etc/fluent/fluent.conf -> /etc/fluent/configs.d/user/fluent.conf sh-4.2# ls -ld /etc/fluent/configs.d/user/fluent.conf lrwxrwxrwx. 1 root root 18 Jul 17 07:41 /etc/fluent/configs.d/user/fluent.conf -> ..data/fluent.conf sh-4.2# ls -ld ..data/fluent.conf ls: cannot access ..data/fluent.conf: No such file or directory ~~~ Following the symlinks, the latest is to ..data/fluentd.conf and it's a symlink broken. Then, fluentd has an empty configuration file. [Version-Release number of selected component (if applicable)] $ oc version Client Version: 4.4.12 Server Version: 4.4.12 Kubernetes Version: v1.17.1+a1af596 $ oc get csv -n openshift-logging NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.4.0-202007060343.p0 Cluster Logging 4.4.0-202007060343.p0 Succeeded elasticsearch-operator.4.4.0-202007060343.p0 Elasticsearch Operator 4.4.0-202007060343.p0 Succeeded [How reproducible] Always [Steps to Reproduce] Indicated in the description [Additional info] I'll check if the same happens using other configurations, not only with syslog [1] https://docs.openshift.com/container-platform/4.4/logging/cluster-logging-deploying.html [2] https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-external.html#cluster-logging-collector-syslog_cluster-logging-external
Moving to UpcomingSprint for future evaluation
Verified this bug on the Cluster version is 4.4.0-0.nightly-2020-09-14-143910. There are 2 issues mentioned in the bug description: ISSUE 1 is still existing ISSUE 2 is fixed ISSUE 1: "Trying to check the fluentd logs, it's possible to receive one error:" ~~~ $ oc exec fluentd-2v4ql -- logs ls: cannot access /var/log/fluentd: No such file or directory $ oc logs fluentd-2v4ql 2020-09-15 14:56:27 +0000 [warn]: out:syslog: failed to open tcp socket syslogserver.openshift-logging.svc.cluster.local:514 :getaddrinfo: Name or service not known 2020-09-15 14:56:40 +0000 [warn]: got unrecoverable error in primary and no secondary error_class=ArgumentError error="'Metadata' is not a designated severity" 2020-09-15 14:56:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/syslog_protocol-0.9.2/lib/syslog_protocol/packet.rb:72:in `severity=' 2020-09-15 14:56:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:104:in `send_to_syslog' 2020-09-15 14:56:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:90:in `block in write' 2020-09-15 14:56:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/event.rb:327:in `each' (There is a separate bug for this fluentd error log: Bug 1852341) ISSUE 2: "If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty:" ~~~ The fluentd configuration file is not empty. Moving this bug back to 'ASSIGNED' state.
Verified this bug on the Cluster version is 4.4.0-0.nightly-2020-09-14-143910. 'division by zero' error log in fluentd pods is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.4.23 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3717