Bug 1552304
Summary: | CRI-O pod log lines are occasionally split mid-line | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Logging | Assignee: | Noriko Hosoi <nhosoi> | ||||
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.9.0 | CC: | amurdaca, aos-bugs, decarr, jcantril, jokerman, mmccomas, mpatel, nhosoi, pportant, rmeggins, sople, tibrahim | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.11.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openshift3/ose-logging-fluentd:v3.11.23-1 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: Log messages from cri-o pod could be split in the middle by nature.
Consequence: Partial log messages are indexed in the elasticsearch.
Fix: The newer fluent-plugin-concat supports merging the cri-o style split messages into one, which is not available for the current fluentd (v0.12) that openshift logging v3.11 uses. The functionality was backported to the fluentd v0.12.
Result: The cri-o style split log messages are merged back to the original full message.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-11-20 03:10:43 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1642547 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
This could be considered high severity for applications which produce log lines requiring parsing by business applications. Created attachment 1405046 [details]
pod log showing line splits
In the attached file, see lines 164 and 165 for the split happening
Lines 645 and 646 show the split happening at the same place in the payload as line 164
Lines 837 and 838 have it happening in a different position in the payload
All of the split lines have the P annotation as designed. The log parser should look for P as a partial line and continue reading till it hits the next F in the output to complete the line. Looks like this should be moved to Logging team according to Murnal's comment. Moving it, re-move otherwise (In reply to Mrunal Patel from comment #3) > All of the split lines have the P annotation as designed. The log parser > should look for P as a partial line and continue reading till it hits the > next F in the output to complete the line. What is the default maximum size of a log entry that crio writes? Is this configurable? Note that this is a huge feature i.e. 3.11 at the earliest. We will need to figure out a way to mitigate this problem in the meantime. Moving from POST back to NEW so this does not get lost. @mrunal - What is the default maximum size of a log entry that crio writes? Is this configurable? Rich, the line size today is 8kb. We are making some changes and might make it configurable. I will let you know if/when that goes in. This gets to the broken foundations of byte capture console logging: the interpretation of the byte stream is performed too early. Mruanl, where is the proper upstream code base for CRI-O so that I might be able to make a suggested PR to address this? @portante, cri-o upstream is here: https://github.com/kubernetes-incubator/cri-o Confirming that the issue still exists in OCP 3.10 with cri-o.x86_64 1.10.2-1.rhaos3.10.git095e88c.el7 Saw this while verifying https://bugzilla.redhat.com/show_bug.cgi?id=1554407 is no longer an issue. Verified on OCP 3.11.31 + CRIO 1.11.7 but I hit https://bugzilla.redhat.com/show_bug.cgi?id=1642547 along the way. Holding off on final verification until that is fixed. Docs in progress for https://bugzilla.redhat.com/show_bug.cgi?id=1642547 Marking this bz as verified on OCP 3.11.31 + CRIO 1.11.7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537 We will not backport this fix to 3.9 |
Description of problem: Running OpenShift logging tests we use a tool to create flexible pod logs. The tool controls rate, payload size, payload content etc. See When logging with Docker json-file logging and Docker 1.12 runtime, the pod logs under /var/log/containers always contain exactly the number of lines emitted by the pod. There is no splitting of lines. 10000 lines out of the tool = 10000 lines in the pod log = 10000 lines in elasticsearch When logging with the crio runtime (registry.reg-aws.openshift.com:443/openshift3/cri-o v3.9.0 1617b0aab7ce) there are always a few split lines. The number is seemingly random and appears to happen slightly more often with larger message payloads. Recent runs: 10000 lines with 512 byte payload: 10006 lines in the pod log 10000 lines with 1024 byte payload: 10022 lines in the pod log (attached) 10000 lines with 4096 byte payload: 10573 lines in the pod log Version-Release number of selected component (if applicable): OCP 3.9.2 registry.reg-aws.openshift.com:443/openshift3/cri-o v3.9.0 1617b0aab7ce How reproducible: Always when running enough messages. Usually > 5000 or so Steps to Reproduce: 1. With a cri-o run time, run the SVT logtest replication controller (see README above) 2. Use the following input yaml: projects: - num: 2 basename: logtest ifexists: delete tuning: default templates: - num: 1 file: ./content/logtest/logtest-rc.json parameters: - REPLICAS: "1" - INITIAL_FLAGS: "--num-lines 10000 --line-length 1024 --word-length 9 --rate 6000 --fixed-line\n" This will run the logging pod for about 2 minutes and produce 10000 lines of fixed content. Every line is the same with the exception of the sequence number 3. Go to the node and find the pod log in /var/log/containers and run wc -l Actual results: 10000 lines in pod log Expected results: > 10000 lines in pod log Additional info: