Bug 1978699

Summary: external log forwarding for syslog is not working over tls
Product: OpenShift Container Platform Reporter: Nishant Chauhan <nishantchauhan>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.8CC: alklein, aos-bugs, brueckner, Holger.Wolf, jschinta, periklis, wolfgang.voesch
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-05 08:18:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1934148    
Attachments:
Description Flags
fluentd conf file none

Description Nishant Chauhan 2021-07-02 14:17:11 UTC
Created attachment 1797206 [details]
fluentd conf file

Description of problem:

We are not able to send logs to external syslog server over tls by using ClusterLogForwarder instance. while the same configuration was working and able to receive logs with 5.0.x version operator.

When I switched to tcp (without tls), I was able to receive buffered logs (contains tls logs) as I have used a different labels in ClusterLogForwarder file for tls configuration.

Version-Release number of selected component (if applicable):

[root@bastion rsyslog]# oc version
Client Version: 4.8.0-rc.1
Server Version: 4.8.0-rc.1
Kubernetes Version: v1.21.0-rc.0+766a5fe
[root@bastion rsyslog]#

[root@bastion ~]# oc get csv
NAME                              DISPLAY                            VERSION    REPLACES   PHASE
cluster-logging.5.1.0-53          Cluster Logging                    5.1.0-53              Succeeded
elasticsearch-operator.5.1.0-74   OpenShift Elasticsearch Operator   5.1.0-74              Succeeded


How reproducible:

Configuration of external syslog server:

global(
DefaultNetstreamDriverCAFile="/root/log-forward-test/rsyslog/tls/ca.pem"
DefaultNetstreamDriverCertFile="/root/log-forward-test/rsyslog/tls/server.crt"
DefaultNetstreamDriverKeyFile="/root/log-forward-test/rsyslog/tls/server.key"
)

module( load="imtcp"

        StreamDriver.Name = "gtls"
        StreamDriver.Mode = "1"
        StreamDriver.AuthMode = "anon"
)

input(  type="imtcp"
        port="6514"
)


Same logs from all fluentd pods:
[root@bastion rsyslog.d]# oc get po
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-65467484b9-jbhwt       1/1     Running            0          25h
elasticsearch-cdm-znqpbf3d-1-567f57d66f-75s5w   2/2     Running            0          25h
elasticsearch-cdm-znqpbf3d-2-bcd665fb-qkltr     2/2     Running            0          25h
elasticsearch-cdm-znqpbf3d-3-5558df5b79-j8dw9   2/2     Running            0          25h
elasticsearch-im-app-27087210-8ddgl             0/1     Completed          0          12m
elasticsearch-im-audit-27087210-wl862           0/1     Completed          0          12m
elasticsearch-im-infra-27087210-4brk4           0/1     Completed          0          12m
fluentd-5wmr5                                   1/1     Running            0          51m
fluentd-l9p7l                                   1/1     Running            0          51m
fluentd-m9vsz                                   1/1     Running            0          50m
fluentd-mgwmw                                   1/1     Running            0          51m
fluentd-pchjs                                   1/1     Running            0          51m
fluentd-wt4md                                   1/1     Running            0          50m
kibana-57968d8769-9jz2k                         2/2     Running            0          25h
stress-test-cpustresstest-cd4b5699f-62zp5       0/1     ImagePullBackOff   0          8h
[root@bastion rsyslog.d]# oc logs fluentd-5wmr5
Setting each total_size_limit for 3 buffers to 2126773248 bytes
Setting queued_chunks_limit_size for each buffer to 253
Setting chunk_limit_size for each buffer to 8388608
[root@bastion rsyslog.d]# oc logs fluentd-pchjs
Setting each total_size_limit for 3 buffers to 2126773248 bytes
Setting queued_chunks_limit_size for each buffer to 253
Setting chunk_limit_size for each buffer to 8388608


[root@bastion rsyslog]# oc describe secret tls-secret
Name:         tls-secret
Namespace:    openshift-logging
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
ca-bundle.crt:  3099 bytes
tls.crt:        1277 bytes
tls.key:        1704 bytes
[root@bastion rsyslog]#


TLS - CLF file
----------------------
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
   - name: rsyslog-west
     type: syslog
     syslog:
      rfc: RFC5424
      severity: informational
     url: 'tls://192.168.79.1:6514'
     secret:
        name: tls-secret
  pipelines:
   - name: syslog-west
     inputRefs:
     - infrastructure
     - application
     - audit
     outputRefs:
     - rsyslog-west
     - default
     labels:
       syslog: westtls
------------------------

TCP - CLF file
-------------------
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
   - name: rsyslog-west
     type: syslog
     syslog:
      rfc: RFC5424
      severity: informational
     url: 'udp://192.168.79.1:514'
  pipelines:
   - name: syslog-west
     inputRefs:
     - infrastructure
     outputRefs:
     - rsyslog-west
     - default
     labels:
       syslog: west
------------------------


logs received when I switched to TCP ClusterLogForwarder

---------------
Jul  2 10:43:28 master-2.m13lp83ocp.lnxne.boe fluentd kind:Event#011apiVersion:audit.k8s.io/v1#011level:info#011auditID:7a4dabae-85f5-4ecd-8c64-faa8e27c2300#011stage:ResponseComplete#011requestURI:/apis/local.storage.openshift.io/v1/namespaces/openshift-local-storage/localvolumes/lv-mon#011verb:get#011user:{"username"=>"system:serviceaccount:openshift-local-storage:local-storage-admin", "uid"=>"7140fda0-0b09-49fa-bba7-06b3db04c3d8", "groups"=>["system:serviceaccounts", "system:serviceaccounts:openshift-local-storage", "system:authenticated"], "extra"=>{"authentication.kubernetes.io/pod-name"=>["lv-mon-local-diskmaker-x5njt"], "authentication.kubernetes.io/pod-uid"=>["a756a6b0-b9cc-49c3-8251-9eb0fe6857be"]}}#011sourceIPs:["192.168.79.20"]#011userAgent:diskmaker/v0.0.0 (linux/s390x) kubernetes/$Format#011objectRef:{"resource"=>"localvolumes", "namespace"=>"openshift-local-storage", "name"=>"lv-mon", "apiGroup"=>"local.storage.openshift.io", "apiVersion"=>"v1"}#011responseStatus:{"code"=>200}#011requestReceivedTimestamp:2021-07-02T10:29:26.473737Z#011stageTimestamp:2021-07-02T10:29:26.493566Z#011annotations:{"authorization.k8s.io/decision"=>"allow", "authorization.k8s.io/reason"=>"RBAC: allowed by RoleBinding \"local-storage-operator.4.6.0-202103010126.p0-local-s-785d857cbd/openshift-local-storage\" of Role \"local-storage-operator.4.6.0-202103010126.p0-local-s-785d857cbd\" to ServiceAccount \"local-storage-admin/openshift-local-storage\""}#011k8s_audit_level:Metadata#011message:#011hostname:master-2.m13lp83ocp.lnxne.boe#011pipeline_metadata:{"collector"=>{"ipaddr4"=>"192.168.79.23", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2021-07-02T10:29:26.500461+00:00", "version"=>"1.7.4 1.6.0"}}#011@timestamp:2021-07-02T10:29:26.473737+00:00#011viaq_index_name:audit-write#011viaq_msg_id:NWNkNGM3NzQtMjc2ZC00ZjZmLTk1YmMtNTlkNjcxNDNlMzlm#011openshift:{"labels"=>{"syslog"=>"westtls"}}
.
.
.
.
.
.
Jul  2 10:44:08 worker-1.m13lp83ocp.lnxne.boe fluentd _STREAM_ID:f508be6ad5694ed7ad246dd65ecba6da#011_SYSTEMD_INVOCATION_ID:b2b57df83bc541a099b53663118139bc#011systemd:{"t"=>{"BOOT_ID"=>"e6a07c4fb99542aea11274f36341278e", "CAP_EFFECTIVE"=>"ffffffffff", "CMDLINE"=>"kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --node-ip=192.168.79.25 --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c2a1c5f73eb01d39a5584338c1988098d71b1bc51ca2d002e80751419180f54 --system-reserved=cpu=500m,memory=1Gi --v=2", "COMM"=>"kubelet", "EXE"=>"/usr/bin/kubelet", "GID"=>"0", "MACHINE_ID"=>"43f75d2b3b62419b8367ba46d2a2a007", "PID"=>"1678", "SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "STREAM_ID"=>"f508be6ad5694ed7ad246dd65ecba6da", "SYSTEMD_CGROUP"=>"/system.slice/kubelet.service", "SYSTEMD_INVOCATION_ID"=>"b2b57df83bc541a099b53663118139bc", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"kubelet.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"hyperkube"}}#011level:info#011message:W0702 10:44:06.861641    1678 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease#011hostname:worker-1.m13lp83ocp.lnxne.boe#011pipeline_metadata:{"collector"=>{"ipaddr4"=>"192.168.79.25", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2021-07-02T10:44:07.259577+00:00", "version"=>"1.7.4 1.6.0"}}#011@timestamp:2021-07-02T10:44:06.866710+00:00#011viaq_index_name:infra-write#011viaq_msg_id:NTk5YmM3MzktNWZhOC00ZmQ3LTkxNDEtN2UyYzM2ZWFiZWQw#011openshift:{"labels"=>{"syslog"=>"west"}}
----------------

Attached fluentd.conf file extracted from below command.
oc extract cm/fluentd


Additional info:
Please let me know if must-gather data is required.

Comment 1 wolfgang.voesch 2021-07-02 14:25:07 UTC
From a discussion with Anping: There is a potential fix in flight: https://github.com/openshift/cluster-logging-operator/pull/1083/files

Comment 2 Periklis Tsirakidis 2021-07-05 08:18:24 UTC
Closing this as per 5.x issues should be opened on https://issues.redhat.com/browse/LOG as Bug Tickets. Please use Bugzilla only if affected version is 4.6.z