Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1861224

Summary: Unfunctional fluentd pods when configuring only the collector and configured to send to an external syslog
Product: OpenShift Container Platform Reporter: Periklis Tsirakidis <periklis>
Component: LoggingAssignee: Periklis Tsirakidis <periklis>
Status: CLOSED ERRATA QA Contact: Giriyamma <gkarager>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: aos-bugs, jcantril, oarribas
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-core
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-22 06:51:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1861013    
Bug Blocks:    

Description Periklis Tsirakidis 2020-07-28 06:04:56 UTC
This bug was initially created as a copy of Bug #1858200

I am copying this bug because: 



[Description of problem]

Trying to deploy only the collector to send the logs to an external syslog, gets the fluentd pods unfuntional. 

To install the CLO was followed step by step the documentation here [1] 
to configure the syslog configmap was followed step by step the documentation here [2]


## Define the clusterLogging instance only with the collector
$ cat clo-instance.yaml 
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  collection:
    logs:
      type: "fluentd"
      fluentd: {}
$ oc create -f clo-instance.yaml 

## The fluentd pods are created
$ oc get pods
NAME                                        READY   STATUS    RESTARTS   AGE
cluster-logging-operator-847d45b4bb-4q4lx   1/1     Running   0          2m29s
fluentd-szr6z                               1/1     Running   0          2m16s
fluentd-xsbqd                               1/1     Running   0          2m16s

## Create syslog cm
$ cat syslog.yml 
kind: ConfigMap
apiVersion: v1
metadata:
  name: syslog
  namespace: openshift-logging
data:
  syslog.conf: |
    <store>
     @type syslog_buffered
     remote_syslog syslogserver.openshift-logging.svc.cluster.local
     port 514
     hostname ${hostname}
     remove_tag_prefix tag
     tag_key ident,systemd.u.SYSLOG_IDENTIFIER
     facility local0
     severity info
     use_record true
     payload_key message
    </store>
$ oc create -f syslog.yml

## List the configmaps
$ oc get cm
NAME                            DATA   AGE
cluster-logging-operator-lock   0      4m
fluentd                         3      3m51s
fluentd-trusted-ca-bundle       1      3m51s
syslog                          1      3s

At this point, two issues are happening,

ISSUE 1
#######

Trying to check the fluentd logs, it's possible to receive one error:

~~~
$ oc logs fluentd-szr6z 
expr: division by zero
run.sh: line 103: [: too many arguments
expr: syntax error
run.sh: line 108: [: too many arguments
~~~

Then, we'll try in a different way:

~~~
$ oc exec fluentd-szr6z -- logs
ls: cannot access /var/log/fluentd: No such file or directory
~~~

And if we verify if the /var/log/fluentd directory exists, it doesn't exist:

~~~
$ oc rsh fluentd-szr6z ls -ld /var/log/fluentd
ls: cannot access /var/log/fluentd: No such file or directory
command terminated with exit code 2
~~~

ISSUE 2
#######
If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty:

~~~
$ oc rsh fluentd-szr6z 
sh-4.2# cat /etc/fluent/fluent.conf 
sh-4.2# ls -ld /etc/fluent/fluent.conf 
lrwxrwxrwx. 1 root root 38 Jul  6 05:27 /etc/fluent/fluent.conf -> /etc/fluent/configs.d/user/fluent.conf
sh-4.2# ls -ld /etc/fluent/configs.d/user/fluent.conf 
lrwxrwxrwx. 1 root root 18 Jul 17 07:41 /etc/fluent/configs.d/user/fluent.conf -> ..data/fluent.conf
sh-4.2# ls -ld ..data/fluent.conf 
ls: cannot access ..data/fluent.conf: No such file or directory
~~~

Following the symlinks, the latest is to ..data/fluentd.conf and it's a symlink broken. Then, fluentd has an empty configuration file.



[Version-Release number of selected component (if applicable)]

$ oc version
Client Version: 4.4.12
Server Version: 4.4.12
Kubernetes Version: v1.17.1+a1af596

$ oc get csv -n openshift-logging
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.4.0-202007060343.p0           Cluster Logging          4.4.0-202007060343.p0              Succeeded
elasticsearch-operator.4.4.0-202007060343.p0   Elasticsearch Operator   4.4.0-202007060343.p0              Succeeded


[How reproducible]

Always

[Steps to Reproduce]

Indicated in the description


[Additional info]
I'll check if the same happens using other configurations, not only with syslog


[1] https://docs.openshift.com/container-platform/4.4/logging/cluster-logging-deploying.html
[2] https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-external.html#cluster-logging-collector-syslog_cluster-logging-external

Comment 2 Jeff Cantrill 2020-08-21 14:11:07 UTC
Moving to UpcomingSprint for future evaluation

Comment 5 Giriyamma 2020-09-15 16:44:31 UTC
Verified this bug on the Cluster version is 4.4.0-0.nightly-2020-09-14-143910.

There are 2 issues mentioned in the bug description:

ISSUE 1 is still existing
ISSUE 2 is fixed

ISSUE 1: 
"Trying to check the fluentd logs, it's possible to receive one error:"
~~~

$ oc exec fluentd-2v4ql -- logs
ls: cannot access /var/log/fluentd: No such file or directory

$ oc logs fluentd-2v4ql
2020-09-15 14:56:27 +0000 [warn]: out:syslog: failed to open tcp socket  syslogserver.openshift-logging.svc.cluster.local:514 :getaddrinfo: Name or service not known
2020-09-15 14:56:40 +0000 [warn]: got unrecoverable error in primary and no secondary error_class=ArgumentError error="'Metadata' is not a designated severity"
2020-09-15 14:56:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/syslog_protocol-0.9.2/lib/syslog_protocol/packet.rb:72:in `severity='
2020-09-15 14:56:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:104:in `send_to_syslog'
2020-09-15 14:56:40 +0000 [warn]: /etc/fluent/plugin/out_syslog_buffered.rb:90:in `block in write'
2020-09-15 14:56:40 +0000 [warn]: /opt/rh/rh-ruby25/root/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/event.rb:327:in `each'
(There is a separate bug for this fluentd error log: Bug 1852341)

ISSUE 2: 
"If I enter to the fluentd pods, it's possible to see that the file /etc/fluent/fluentd.conf is empty:"
~~~

The fluentd configuration file is not empty.

Moving this bug back to 'ASSIGNED' state.

Comment 6 Giriyamma 2020-09-16 08:32:06 UTC
Verified this bug on the Cluster version is 4.4.0-0.nightly-2020-09-14-143910.

'division by zero' error log in fluentd pods is fixed.

Comment 8 errata-xmlrpc 2020-09-22 06:51:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.23 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3717