Bug 1978699 - external log forwarding for syslog is not working over tls
Summary: external log forwarding for syslog is not working over tls
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.8
Hardware: s390x
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: ocp-48-z-tracker
TreeView+ depends on / blocked
 
Reported: 2021-07-02 14:17 UTC by Nishant Chauhan
Modified: 2021-07-05 08:18 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-05 08:18:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fluentd conf file (18.85 KB, text/plain)
2021-07-02 14:17 UTC, Nishant Chauhan
no flags Details

Description Nishant Chauhan 2021-07-02 14:17:11 UTC
Created attachment 1797206 [details]
fluentd conf file

Description of problem:

We are not able to send logs to external syslog server over tls by using ClusterLogForwarder instance. while the same configuration was working and able to receive logs with 5.0.x version operator.

When I switched to tcp (without tls), I was able to receive buffered logs (contains tls logs) as I have used a different labels in ClusterLogForwarder file for tls configuration.

Version-Release number of selected component (if applicable):

[root@bastion rsyslog]# oc version
Client Version: 4.8.0-rc.1
Server Version: 4.8.0-rc.1
Kubernetes Version: v1.21.0-rc.0+766a5fe
[root@bastion rsyslog]#

[root@bastion ~]# oc get csv
NAME                              DISPLAY                            VERSION    REPLACES   PHASE
cluster-logging.5.1.0-53          Cluster Logging                    5.1.0-53              Succeeded
elasticsearch-operator.5.1.0-74   OpenShift Elasticsearch Operator   5.1.0-74              Succeeded


How reproducible:

Configuration of external syslog server:

global(
DefaultNetstreamDriverCAFile="/root/log-forward-test/rsyslog/tls/ca.pem"
DefaultNetstreamDriverCertFile="/root/log-forward-test/rsyslog/tls/server.crt"
DefaultNetstreamDriverKeyFile="/root/log-forward-test/rsyslog/tls/server.key"
)

module( load="imtcp"

        StreamDriver.Name = "gtls"
        StreamDriver.Mode = "1"
        StreamDriver.AuthMode = "anon"
)

input(  type="imtcp"
        port="6514"
)


Same logs from all fluentd pods:
[root@bastion rsyslog.d]# oc get po
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-65467484b9-jbhwt       1/1     Running            0          25h
elasticsearch-cdm-znqpbf3d-1-567f57d66f-75s5w   2/2     Running            0          25h
elasticsearch-cdm-znqpbf3d-2-bcd665fb-qkltr     2/2     Running            0          25h
elasticsearch-cdm-znqpbf3d-3-5558df5b79-j8dw9   2/2     Running            0          25h
elasticsearch-im-app-27087210-8ddgl             0/1     Completed          0          12m
elasticsearch-im-audit-27087210-wl862           0/1     Completed          0          12m
elasticsearch-im-infra-27087210-4brk4           0/1     Completed          0          12m
fluentd-5wmr5                                   1/1     Running            0          51m
fluentd-l9p7l                                   1/1     Running            0          51m
fluentd-m9vsz                                   1/1     Running            0          50m
fluentd-mgwmw                                   1/1     Running            0          51m
fluentd-pchjs                                   1/1     Running            0          51m
fluentd-wt4md                                   1/1     Running            0          50m
kibana-57968d8769-9jz2k                         2/2     Running            0          25h
stress-test-cpustresstest-cd4b5699f-62zp5       0/1     ImagePullBackOff   0          8h
[root@bastion rsyslog.d]# oc logs fluentd-5wmr5
Setting each total_size_limit for 3 buffers to 2126773248 bytes
Setting queued_chunks_limit_size for each buffer to 253
Setting chunk_limit_size for each buffer to 8388608
[root@bastion rsyslog.d]# oc logs fluentd-pchjs
Setting each total_size_limit for 3 buffers to 2126773248 bytes
Setting queued_chunks_limit_size for each buffer to 253
Setting chunk_limit_size for each buffer to 8388608


[root@bastion rsyslog]# oc describe secret tls-secret
Name:         tls-secret
Namespace:    openshift-logging
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
ca-bundle.crt:  3099 bytes
tls.crt:        1277 bytes
tls.key:        1704 bytes
[root@bastion rsyslog]#


TLS - CLF file
----------------------
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
   - name: rsyslog-west
     type: syslog
     syslog:
      rfc: RFC5424
      severity: informational
     url: 'tls://192.168.79.1:6514'
     secret:
        name: tls-secret
  pipelines:
   - name: syslog-west
     inputRefs:
     - infrastructure
     - application
     - audit
     outputRefs:
     - rsyslog-west
     - default
     labels:
       syslog: westtls
------------------------

TCP - CLF file
-------------------
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
   - name: rsyslog-west
     type: syslog
     syslog:
      rfc: RFC5424
      severity: informational
     url: 'udp://192.168.79.1:514'
  pipelines:
   - name: syslog-west
     inputRefs:
     - infrastructure
     outputRefs:
     - rsyslog-west
     - default
     labels:
       syslog: west
------------------------


logs received when I switched to TCP ClusterLogForwarder

---------------
Jul  2 10:43:28 master-2.m13lp83ocp.lnxne.boe fluentd kind:Event#011apiVersion:audit.k8s.io/v1#011level:info#011auditID:7a4dabae-85f5-4ecd-8c64-faa8e27c2300#011stage:ResponseComplete#011requestURI:/apis/local.storage.openshift.io/v1/namespaces/openshift-local-storage/localvolumes/lv-mon#011verb:get#011user:{"username"=>"system:serviceaccount:openshift-local-storage:local-storage-admin", "uid"=>"7140fda0-0b09-49fa-bba7-06b3db04c3d8", "groups"=>["system:serviceaccounts", "system:serviceaccounts:openshift-local-storage", "system:authenticated"], "extra"=>{"authentication.kubernetes.io/pod-name"=>["lv-mon-local-diskmaker-x5njt"], "authentication.kubernetes.io/pod-uid"=>["a756a6b0-b9cc-49c3-8251-9eb0fe6857be"]}}#011sourceIPs:["192.168.79.20"]#011userAgent:diskmaker/v0.0.0 (linux/s390x) kubernetes/$Format#011objectRef:{"resource"=>"localvolumes", "namespace"=>"openshift-local-storage", "name"=>"lv-mon", "apiGroup"=>"local.storage.openshift.io", "apiVersion"=>"v1"}#011responseStatus:{"code"=>200}#011requestReceivedTimestamp:2021-07-02T10:29:26.473737Z#011stageTimestamp:2021-07-02T10:29:26.493566Z#011annotations:{"authorization.k8s.io/decision"=>"allow", "authorization.k8s.io/reason"=>"RBAC: allowed by RoleBinding \"local-storage-operator.4.6.0-202103010126.p0-local-s-785d857cbd/openshift-local-storage\" of Role \"local-storage-operator.4.6.0-202103010126.p0-local-s-785d857cbd\" to ServiceAccount \"local-storage-admin/openshift-local-storage\""}#011k8s_audit_level:Metadata#011message:#011hostname:master-2.m13lp83ocp.lnxne.boe#011pipeline_metadata:{"collector"=>{"ipaddr4"=>"192.168.79.23", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2021-07-02T10:29:26.500461+00:00", "version"=>"1.7.4 1.6.0"}}#011@timestamp:2021-07-02T10:29:26.473737+00:00#011viaq_index_name:audit-write#011viaq_msg_id:NWNkNGM3NzQtMjc2ZC00ZjZmLTk1YmMtNTlkNjcxNDNlMzlm#011openshift:{"labels"=>{"syslog"=>"westtls"}}
.
.
.
.
.
.
Jul  2 10:44:08 worker-1.m13lp83ocp.lnxne.boe fluentd _STREAM_ID:f508be6ad5694ed7ad246dd65ecba6da#011_SYSTEMD_INVOCATION_ID:b2b57df83bc541a099b53663118139bc#011systemd:{"t"=>{"BOOT_ID"=>"e6a07c4fb99542aea11274f36341278e", "CAP_EFFECTIVE"=>"ffffffffff", "CMDLINE"=>"kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --node-ip=192.168.79.25 --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c2a1c5f73eb01d39a5584338c1988098d71b1bc51ca2d002e80751419180f54 --system-reserved=cpu=500m,memory=1Gi --v=2", "COMM"=>"kubelet", "EXE"=>"/usr/bin/kubelet", "GID"=>"0", "MACHINE_ID"=>"43f75d2b3b62419b8367ba46d2a2a007", "PID"=>"1678", "SELINUX_CONTEXT"=>"system_u:system_r:container_runtime_t:s0", "STREAM_ID"=>"f508be6ad5694ed7ad246dd65ecba6da", "SYSTEMD_CGROUP"=>"/system.slice/kubelet.service", "SYSTEMD_INVOCATION_ID"=>"b2b57df83bc541a099b53663118139bc", "SYSTEMD_SLICE"=>"system.slice", "SYSTEMD_UNIT"=>"kubelet.service", "TRANSPORT"=>"stdout", "UID"=>"0"}, "u"=>{"SYSLOG_FACILITY"=>"3", "SYSLOG_IDENTIFIER"=>"hyperkube"}}#011level:info#011message:W0702 10:44:06.861641    1678 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease#011hostname:worker-1.m13lp83ocp.lnxne.boe#011pipeline_metadata:{"collector"=>{"ipaddr4"=>"192.168.79.25", "inputname"=>"fluent-plugin-systemd", "name"=>"fluentd", "received_at"=>"2021-07-02T10:44:07.259577+00:00", "version"=>"1.7.4 1.6.0"}}#011@timestamp:2021-07-02T10:44:06.866710+00:00#011viaq_index_name:infra-write#011viaq_msg_id:NTk5YmM3MzktNWZhOC00ZmQ3LTkxNDEtN2UyYzM2ZWFiZWQw#011openshift:{"labels"=>{"syslog"=>"west"}}
----------------

Attached fluentd.conf file extracted from below command.
oc extract cm/fluentd


Additional info:
Please let me know if must-gather data is required.

Comment 1 wolfgang.voesch 2021-07-02 14:25:07 UTC
From a discussion with Anping: There is a potential fix in flight: https://github.com/openshift/cluster-logging-operator/pull/1083/files

Comment 2 Periklis Tsirakidis 2021-07-05 08:18:24 UTC
Closing this as per 5.x issues should be opened on https://issues.redhat.com/browse/LOG as Bug Tickets. Please use Bugzilla only if affected version is 4.6.z


Note You need to log in before you can comment on or make changes to this bug.