Bug 1816836

Summary: Fluentd inserts \\'\\' into some log messages ingested from container logs
Product: OpenShift Container Platform Reporter: Sergey Yedrikov <syedriko>
Component: LoggingAssignee: Sergey Yedrikov <syedriko>
Status: CLOSED ERRATA QA Contact: Qiaoling Tang <qitang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4CC: anli, aos-bugs, qitang
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Incorrect line separator configured in Fluentd for container log lines split across multiple lines in CRI-O logs. Consequence: Fluentd inserts \\'\\' into some log messages ingested from container logs. Fix: Configure the correct (empty) line separator for Fluentd. Result: Fluentd no longer inserts \\'\\' into some log messages ingested from container logs.
Story Points: ---
Clone Of: 1816770 Environment:
Last Closed: 2020-05-04 11:47:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1816770    
Bug Blocks: 1816837    

Description Sergey Yedrikov 2020-03-24 21:05:16 UTC
+++ This bug was initially created as a clone of Bug #1816770 +++

Description of problem:

CRI-O sometimes splits log lines that come out of containers into two lines in its own log:

2020-03-22T04:25:26.859339384+00:00 stdout F loader seq - 8edc9164e37340bb8b5608e5fea61cc8 - 0000104795 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2020-03-22T04:25:26.859339384+00:00 stdout P loader seq 
2020-03-22T04:25:26.859419323+00:00 stdout F - 8edc9164e37340bb8b5608e5fea61cc8 - 0000104796 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

F stands for a Full and P stands for a Partial container log line. Partial log lines are continued on the next line of CRI-O log.

Fluentd reassembles partial CRI-O log lines into full ones with the help of the https://github.com/fluent-plugins-nursery/fluent-plugin-concat plugin: https://github.com/openshift/cluster-logging-operator/blob/master/pkg/generators/forwarding/fluentd/templates.go#L69:L75 . The separator for the two parts of the reassembled line is

separator \'\'

and Fluentd inserts that into the log messages that it forwards to Elastic; for example:

{"_index":".operations.2020.03.22","_type":"com.redhat.viaq.common","_id":"ZDY5ZjUwYTEtNGVlNi00MjQzLTgxZjgtNWMzOWIzNDYxNDIw","_score":2.0310788,"_source":{"pipeline_metadata":{"collector":{"received_at":"2020-03-22T04:25:26.859419323+00:00"}},"message":"loader seq \\'\\'- 8edc9164e37340bb8b5608e5fea61cc8 - 0000104796 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always, with enough logs generated by a container to trigger a split line output from CRI-o

Steps to Reproduce:
1. Run the following job
apiVersion: batch/v1
kind: Job
metadata:
  name: logging-load-driver
spec:
  template:
    spec:
      containers:
      - name: logging-load-driver
        image: quay.io/syedriko/logging-load-driver:latest
        command: ["/loader", "--msgpersec", "100", "--payload-gen", "fixed", "--distribution", "fixed", "100", "--report", "stderr", "--total-size", "10"]
      restartPolicy: Never
  backoffLimit: 1

, which wraps the loader from https://github.com/ViaQ/logging-load-driver

2. In Kibana, search for "message: \\'\\'"


Actual results:
loader seq \\'\\'- 8edc9164e37340bb8b5608e5fea61cc8 - 0000104796 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Expected results:
loader seq - 8edc9164e37340bb8b5608e5fea61cc8 - 0000104796 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Additional info:

--- Additional comment from Sergey Yedrikov on 2020-03-24 20:39:17 UTC ---

More details on how to repro this issue:

Once the logging-load-driver job has completed

2. Create a route for Elasticsearch as described in https://docs.openshift.com/container-platform/4.3/logging/config/cluster-logging-elasticsearch.html#cluster-logging-elasticsearch-exposing_cluster-logging-elasticsearch.

3. Install https://github.com/taskrabbit/elasticsearch-dump as described in https://github.com/taskrabbit/elasticsearch-dump#installing.

4. Dump the logs generated by the logging-load-driver job, substituting the kubernetes.pod_name with the pod name of the completed job and your authorization token from "oc whoami -t":

./node_modules/elasticdump/bin/elasticdump --input=https://elasticsearch-openshift-logging.apps-crc.testing/.operations.* --output=$ --searchBody='{"_source": ["pipeline_metadata.collector.received_at", "message"], "query":{"bool": { "must": [{ "term": { "kubernetes.pod_name": "logging-load-driver-phwcz"}}]}}}' --headers='{"Authorization": "Bearer 1u9oYgxIhzQOYxoHYfjPXEROshmanQNjROCz3jNZ-Yg"}' > out

5. Run (this is not a joke) 
$ grep \\\\\\\\\'\\\\\\\\\' out
You're looking for entries likes this (note "loader seq \\'\\'"):

{"_index":".operations.2020.03.22","_type":"com.redhat.viaq.common","_id":"YmI2YTAyZTktYjY5OC00ZTQxLTg3YzQtNmQzODA0NmZiZGMz","_score":2.0310788,"_source":{"pipeline_metadata":{"collector":{"received_at":"2020-03-22T04:25:00.058292+00:00"}},"message":"loader seq \\'\\'- 8edc9164e37340bb8b5608e5fea61cc8 - 0000102090 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}

whereas normal ones look like this

{"_index":".operations.2020.03.22","_type":"com.redhat.viaq.common","_id":"OTIwNDI1ZmYtODdlYy00ZjhiLWIzZDEtOWU5N2ZmMjEzMjg2","_score":2.0310788,"_source":{"pipeline_metadata":{"collector":{"received_at":"2020-03-22T04:11:40.262683+00:00"}},"message":"loader seq - 8edc9164e37340bb8b5608e5fea61cc8 - 0000021977 - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}

Comment 3 Qiaoling Tang 2020-04-03 06:35:23 UTC
Verified with ose-cluster-logging-operator-v4.4.0-202004020439

Comment 5 errata-xmlrpc 2020-05-04 11:47:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581