1861224 – Unfunctional fluentd pods when configuring only the collector and configured to send to an external syslog

Bug 1861224 - Unfunctional fluentd pods when configuring only the collector and configured to send to an external syslog

Summary: Unfunctional fluentd pods when configuring only the collector and configured ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.4.z
Assignee:	Periklis Tsirakidis
QA Contact:	Giriyamma
Docs Contact:
URL:
Whiteboard:	logging-core
Depends On:	1861013
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-28 06:04 UTC by Periklis Tsirakidis
Modified:	2020-10-16 09:50 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-22 06:51:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-logging-operator pull 674	0	None	closed	Bug 1861224: Disable LogFowarding when legacy protocols in use	2021-01-22 08:42:21 UTC
Red Hat Product Errata	RHBA-2020:3717	0	None	None	None	2020-09-22 06:51:25 UTC

Description Periklis Tsirakidis 2020-07-28 06:04:56 UTC

This bug was initially created as a copy of Bug #1858200

I am copying this bug because: 



[Description of problem]

Trying to deploy only the collector to send the logs to an external syslog, gets the fluentd pods unfuntional. 

To install the CLO was followed step by step the documentation here [1] 
to configure the syslog configmap was followed step by step the documentation here [2]


## Define the clusterLogging instance only with the collector
$ cat clo-instance.yaml 
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  collection:
    logs:
      type: "fluentd"
      fluentd: {}
$ oc create -f clo-instance.yaml 

## The fluentd pods are created
$ oc get pods
NAME                                        READY   STATUS    RESTARTS   AGE
cluster-logging-operator-847d45b4bb-4q4lx   1/1     Running   0          2m29s
fluentd-szr6z                               1/1     Running   0          2m16s
fluentd-xsbqd                               1/1     Running   0          2m16s

## Create syslog cm
$ cat syslog.yml 
kind: ConfigMap
apiVersion: v1
metadata:
  name: syslog
  namespace: openshift-logging
data:
  syslog.conf: |
    <store>
     @type syslog_buffered
     remote_syslog syslogserver.openshift-logging.svc.cluster.local
     port 514
     hostname ${hostname}
     remove_tag_prefix tag
     tag_key ident,systemd.u.SYSLOG_IDENTIFIER
     facility local0
     severity info
     use_record true
     payload_key message
    </store>
$ oc create -f syslog.yml

## List the configmaps
$ oc get cm
NAME                            DATA   AGE
cluster-logging-operator-lock   0      4m
fluentd                         3      3m51s
fluentd-trusted-ca-bundle       1      3m51s
syslog                          1      3s

At this point, two issues are happening,

ISSUE 1
#######

Trying to check the fluentd logs, it's possible to receive one error:

~~~
$ oc logs fluentd-szr6z 
expr: division by zero
run.sh: line 103: [: too many arguments
expr: syntax error
run.sh: line 108: [: too many arguments
~~~

Then, we'll try in a different way:

~~~
$ oc exec fluentd-szr6z -- logs
ls: cannot access /var/log/fluentd: No such file or directory
~~~

And if we verify if the /var/log/fluentd directory exists, it doesn't exist:

~~~
$ oc rsh fluentd-szr6z ls -ld /var/log/fluentd
ls: cannot access /var/log/fluentd: No such file or directory
command terminated with exit code 2
~~~

ISSUE 2
#######
If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty:

~~~
$ oc rsh fluentd-szr6z 
sh-4.2# cat /etc/fluent/fluent.conf 
sh-4.2# ls -ld /etc/fluent/fluent.conf 
lrwxrwxrwx. 1 root root 38 Jul  6 05:27 /etc/fluent/fluent.conf -> /etc/fluent/configs.d/user/fluent.conf
sh-4.2# ls -ld /etc/fluent/configs.d/user/fluent.conf 
lrwxrwxrwx. 1 root root 18 Jul 17 07:41 /etc/fluent/configs.d/user/fluent.conf -> ..data/fluent.conf
sh-4.2# ls -ld ..data/fluent.conf 
ls: cannot access ..data/fluent.conf: No such file or directory
~~~

Following the symlinks, the latest is to ..data/fluentd.conf and it's a symlink broken. Then, fluentd has an empty configuration file.



[Version-Release number of selected component (if applicable)]

$ oc version
Client Version: 4.4.12
Server Version: 4.4.12
Kubernetes Version: v1.17.1+a1af596

$ oc get csv -n openshift-logging
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.4.0-202007060343.p0           Cluster Logging          4.4.0-202007060343.p0              Succeeded
elasticsearch-operator.4.4.0-202007060343.p0   Elasticsearch Operator   4.4.0-202007060343.p0              Succeeded


[How reproducible]

Always

[Steps to Reproduce]

Indicated in the description


[Additional info]
I'll check if the same happens using other configurations, not only with syslog


[1] https://docs.openshift.com/container-platform/4.4/logging/cluster-logging-deploying.html
[2] https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-external.html#cluster-logging-collector-syslog_cluster-logging-external

Comment 2 Jeff Cantrill 2020-08-21 14:11:07 UTC

Moving to UpcomingSprint for future evaluation

Comment 5 Giriyamma 2020-09-15 16:44:31 UTC

Verified this bug on the Cluster version is 4.4.0-0.nightly-2020-09-14-143910.

There are 2 issues mentioned in the bug description:

ISSUE 1 is still existing
ISSUE 2 is fixed

ISSUE 1: 
"Trying to check the fluentd logs, it's possible to receive one error:"
~~~

$ oc exec fluentd-2v4ql -- logs
ls: cannot access /var/log/fluentd: No such file or directory

$ oc logs fluentd-2v4ql
2020-09-15 14:56:27 +0000 [warn]: out:syslog: failed to open tcp socket  syslogserver.openshift-logging.svc.cluster.local:514 :getaddrinfo: Name or service not known
2020-09-15 14:56:40 +0000 [warn]: got unrecoverable error in primary and no secondary error_class=ArgumentError error="'Metadata' is not a designated severity"
2020-09-15 14:56:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/syslog_protocol-0.9.2/lib/syslog_protocol/packet.rb:72:in `severity='
2020-09-15 14:56:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:104:in `send_to_syslog'
2020-09-15 14:56:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:90:in `block in write'
2020-09-15 14:56:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/event.rb:327:in `each'
(There is a separate bug for this fluentd error log: Bug 1852341)

ISSUE 2: 
"If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty:"
~~~

The fluentd configuration file is not empty.

Moving this bug back to 'ASSIGNED' state.

Comment 6 Giriyamma 2020-09-16 08:32:06 UTC

Verified this bug on the Cluster version is 4.4.0-0.nightly-2020-09-14-143910.

'division by zero' error log in fluentd pods is fixed.

Comment 8 errata-xmlrpc 2020-09-22 06:51:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.23 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3717

Note You need to log in before you can comment on or make changes to this bug.