Bug 1891886 - LF GA syslog ( for RFC 3164 ) implementation incompatible with supported legacy feature
Summary: LF GA syslog ( for RFC 3164 ) implementation incompatible with supported lega...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Vimal Kumar
QA Contact: Giriyamma
Rolfe Dlugy-Hegwer
URL:
Whiteboard: logging-core
Depends On:
Blocks: 1971003
TreeView+ depends on / blocked
 
Reported: 2020-10-27 15:30 UTC by Vimal Kumar
Modified: 2024-03-25 16:50 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, because of an LF GA syslog implementation for RFC 3164, logs sent to remote syslog were not compatible with the legacy behavior. The current release fixes this issue. AddLogSource adds details about log's source details to the "message" field. Now, logs sent to remote syslog are compatible with the legacy behavior. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891886[*BZ#1891886*])
Clone Of:
Environment:
Last Closed: 2021-02-24 11:21:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-logging-operator pull 790 0 None closed Bug 1891886: Support Adding log source info to log message 2021-02-15 10:55:03 UTC
Red Hat Product Errata RHBA-2021:0652 0 None None None 2021-02-24 11:22:12 UTC

Description Vimal Kumar 2020-10-27 15:30:57 UTC
Description of problem:

Legacy syslog supported a flag named "use_record", the behaviour of this flag is to 
1. take value of syslog hostname field from record['hostname']

2. be able to pick values of facility/severity from the record. e.g. systemd.u.SYSLOG_IDENTIFIER

3. if "use_record" is enabled AND "payload_key" is set AND record contains kubernetes metadata, prepend "namespace_name=<name>, container_name=<name>, pod_name=<name>, message=record[@payload_key]" to the syslog payload
   else send value of record[@payload_key] as syslog payload


The LF GA (for RFC3164) supports 1 and 2, but does not support 3




Version-Release number of selected component (if applicable): 4.6


How reproducible: Always


Steps to Reproduce:
1. setup remote syslog server
2. setup logging
3. add log forwarder to send logs to remote syslog server

Actual results:
The logs sent to remote syslog are not compatible with the legacy behaviour as described above.


Expected results:
The logs sent to remote syslog should be compatible with the legacy behaviour as described above.

Additional info:

https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/out_syslog_buffered.rb#L96
https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/out_syslog_buffered.rb#L162

Comment 1 Michael Burke 2020-10-28 12:50:23 UTC
Is this expected behavior that a typical syslog user would understand and expect, or should this be noted in the docs?

f "use_record" is enabled AND "payload_key" is set AND record contains kubernetes metadata, prepend "namespace_name=<name>, container_name=<name>, pod_name=<name>, message=record[@payload_key]" to the syslog payload
   else send value of record[@payload_key] as syslog payload

Comment 2 Vimal Kumar 2020-11-16 15:23:18 UTC
Hello Keiichi San,

We have updates the solution to add log source information.



Using the LF spec as:
spec:
  outputs:
  - name: syslogout
    syslog:
      addLogSource: true
      facility: user
      payloadKey: message
      rfc: RFC3164
      severity: debug
      tag: mytag
    type: syslog
    url: tls://syslog-receiver.openshift-logging.svc:24224
  pipelines:
  - inputRefs:
    - application
    name: test-app
    outputRefs:
    - syslogout

The remote syslog server receives the message as:


<15>1 2020-11-16T14:35:11+00:00 crc-j55b9-master-0 mytag - - -  namespace_name=clo-test-13396,pod_name=log-generator-cb7645689-jjkjl,container_name=log-generator,message={"msgcontent":"My life is my message","timestamp":"2020-11-16 14:35:10","tag_key":"rec_tag","index":66}




Can you confirm me this question:
Q. Is the order of namespace_name, pod_name, container_name ok? or do you need them in a specific order?

Comment 3 Keiichi Kii (NEC) 2020-11-16 19:20:22 UTC
Vimal-san,

Thank you for your help.

> Q. Is the order of namespace_name, pod_name, container_name ok? or do you need them in a specific order?

Please give me some time to confirm. I will send it back soon.

I have an question for the reason why adding log source information is limited to RFC3164.

Christian explained me that:

> That fix will not be available for RFC5424 since it does not include the same limitation as RFC3164 (e.g. message size). Meaning, if you  configure the log forwarding using RFC3164 you should see the same output as in previous OCP versions so that there is no impact on your existing customers.

I understand RFC5424 doesn't have the message size limitation as RFC3164.
But it looks like this log format issue is not an issue independent on RFC. 
Is there any reason limited to RF3164?
As OCP3.X did, can we switch adding the additional k8s information regardless of RFC3164 or RFC5424?

Comment 4 Keiichi Kii (NEC) 2020-11-17 16:54:24 UTC
Hello, 

Thank you for your work.

> Can you confirm me this question:
> Q. Is the order of namespace_name, pod_name, container_name ok? or do you
> need them in a specific order?

Can you unify the log format between the legacy way and the new way?
For the legacy way in OCP3.X/OCP4.Y, the log format is the following:

o Pod log

<host name(not fluentd pod name)> output_tag: namespace_name=<namespace_name>, container_name=<container_name>, pod_name=<pod_name>, message=<message>

ex)
Oct 15 02:17:31 openshift3-node1 output_tag: namespace_name= openshift-sdn, container_name=sdn, pod_name=sdn-xgv88, message=I1013 07:12:38.276282    2178 pod.go:541] CNI_DEL mtest/brdsamplepod1-7874cd5dbd-mxjv2

o Host log

<host name(not fluentd pod name)> <daemon name>[<PID>]: <message>

ex)
Oct 13 17:10:27 host-192-168-69-55 crio[1473]: time="2020-10-13 17:10:27.629042921Z" level=info msg="Running pod sandbox: mtest/brdsamplepod1-7874cd5dbd-mxjv2/POD" id=f8564162-30e9-42ef-95bd-bfa23e15b661 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox

Thanks,
Keiichi

Comment 5 Christian Heidenreich 2020-11-17 17:53:04 UTC
Keiichi-san,

Let me try to clarify why we only add that to the older RFC. As you already highlighted, it is indeed independent of the RFC but the reasoning behind our decision is more on the backwards compatibility and future maintenance side. Let me explain.

1. We do not believe that the change done for "use_record" should have gone in as is in the first place. It changes the upstream behaviour and definition of that field massively and was not intended to work as is today. That's also a big reason why we missed this for a long time since all documentation we found gave us a different impression on this field. 
2. We understand your situation and that this is a big issue for you with existing configuration and customers. Therefore, Vimal did everything in his power to at least provide the same information you had before inside the message field. I don't know if we can really get the exact same format but at least the information are there and changes might be minimal now. We hope that this is enough for your customers.
3. RFC3164 is the "legacy" forwarding mechanism and we want to try to keep backwards compatibility as much as possible and your customers that used the prior versions can continue to use that option to stay with the same message content. This RFC had limitations on message size and K8s metadata, and merging some information into the message field may make sense here. But we do not see a reason why we should add a similar behaviour to the RFC5424 for the reason you stated - it does not have these limitations; and you can still use RFC3164.
4. Most importantly, we want to keep backwards compatibility for now and that we achieved with RFC3164 now. In general, people should only use RFC5424 with all the information available and where the original message is not touched.

Hope that makes sense.

Thanks,
Christian

Comment 6 Alan Conway 2020-11-23 18:58:29 UTC
@kkii I want to restore this feature but in a less ad-hoc way, since there are many possible outputs and formats that may want to forward additional metadata fields.
Questions

- do you require the data to be included in the syslog MSG field
  - OR would it be acceptable to add equivalent USERDATA sections
- do you require the exact same message format - comma separated prefix to the record.message text.
  - OR would JSON be acceptable? 
    - If so would sending the full logging JSON (with kubernetes.namespace_name etc.) as syslog MSG record be acceptable?
    - If not why not?

We can provide the exact same feature as before, but I am trying to determine the simplest solution that will also be flexible enough to be applicable to other formats and outputs.
Thanks!
Alan.

Comment 7 Keiichi Kii (NEC) 2020-11-23 22:54:41 UTC
Hello,

Thank you for your help.

> - do you require the data to be included in the syslog MSG field

Yes, we need the data in the syslog MSG fiedld same as the legacy way.
Our customers expect the "<#{pri}>#{generate_timestamp_rfc3164} #{@hostname} #{@tag}: #{@content}" format
(Quoted from assemble_rfc3164() in fluentd/lib/syslog_protocol/lib/syslog_protocol of origin-aggregated-logging).

And they also expect the additional k8s metadata are included in the syslog MSG field("#{@content}").

We hope the new way(remote_syslog plugin) will have a same feature to add the k8s metadata same as syslog and syslog_buffered plugin.

> - do you require the exact same message format - comma separated prefix to the record.message text.

Yes, our customers require the exact same message format.
Their log aggregation system based on syslog depends on the log format and is using text pattern-matching to extract the k8s metadata to monitor.
It's difficult to ask them to change their system because it looks like the log format compatibility is broken.

>   - OR would JSON be acceptable? 

It's not acceptable at this time.
Our customers have their external monitoring system to manage both their existing(legacy) system and new system(OpenShift clusters).
The legacy system is using syslog protocol and syslog protocol is required to monitor the legacy system.
And they can't throw away syslog protocol and their legacy system immediately.

>     - If so would sending the full logging JSON (with kubernetes.namespace_name etc.) as syslog MSG record be acceptable?
>     - If not why not?

The full logging JSON should be too large for the legacy syslog server without no streaming parser.
It will force our customers to have new monitoring system to parse log with different way.

Thanks,
Keiichi

Comment 8 Alan Conway 2020-11-25 21:11:06 UTC
Thanks for the detailed info, will take account in the fix.

Comment 10 Giriyamma 2021-02-05 14:48:12 UTC
Verified using cluster-logging.5.0.0-36, elasticsearch-operator.5.0.0-41.

Comment 12 Vimal Kumar 2021-02-08 15:19:42 UTC
AddLogSource adds details about log's source details into the "message" field. The source details added are pod name, container name, and namespace name.

e.g. An application writes a log line "This is a test message", from namespace=TestNamespace, pod=TestPod, container=TestContainer
Logging would capture this log in the record

{
  "docker": {
    "container_id": "9694d9dcde514f37fe865bb4752114939490f28080f630231223953eb18fff57"
  },
  "kubernetes": {
    "container_name": "TestContainer",
    "namespace_name": "TestNamespace",
    "pod_name": "TestPod",
    "container_image": "....",
    "container_image_id": "....",
    "pod_id": "...",
    "host": "...",
    "master_url": "https://kubernetes.default.svc",
    "namespace_id": "86e19672-1671-472e-9b2e-6ca8c304f32b",
    "flat_labels": [
     ...
    ]
  },
  "message": "This is a test message",
  "level": "unknown",
  "hostname": "crc-j55b9-master-0",
  "pipeline_metadata": {
    ...
  },
  "@timestamp": "2020-11-27T18:32:57.600159+00:00",
  "viaq_index_name": "app-write",
  "viaq_msg_id": "M2QxNzM0MmQtMmVmMy00NjM1LWE1YzAtYjE1MWMxOWE5MTM2"
}

With "AddLogSource", the "message" fields will be changed to include more details: 
   namespace_name=TestNamespace, container_name=TestContainer, pod_name=TestPod, message="This is a test message"

Comment 14 Vimal Kumar 2021-02-09 16:28:48 UTC
@rdlugyhe The Doc looks good.
/lgtm

Comment 17 errata-xmlrpc 2021-02-24 11:21:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0652


Note You need to log in before you can comment on or make changes to this bug.