Hide Forgot
Description of problem: Running OpenShift logging tests we use a tool to create flexible pod logs. The tool controls rate, payload size, payload content etc. See When logging with Docker json-file logging and Docker 1.12 runtime, the pod logs under /var/log/containers always contain exactly the number of lines emitted by the pod. There is no splitting of lines. 10000 lines out of the tool = 10000 lines in the pod log = 10000 lines in elasticsearch When logging with the crio runtime (registry.reg-aws.openshift.com:443/openshift3/cri-o v3.9.0 1617b0aab7ce) there are always a few split lines. The number is seemingly random and appears to happen slightly more often with larger message payloads. Recent runs: 10000 lines with 512 byte payload: 10006 lines in the pod log 10000 lines with 1024 byte payload: 10022 lines in the pod log (attached) 10000 lines with 4096 byte payload: 10573 lines in the pod log Version-Release number of selected component (if applicable): OCP 3.9.2 registry.reg-aws.openshift.com:443/openshift3/cri-o v3.9.0 1617b0aab7ce How reproducible: Always when running enough messages. Usually > 5000 or so Steps to Reproduce: 1. With a cri-o run time, run the SVT logtest replication controller (see README above) 2. Use the following input yaml: projects: - num: 2 basename: logtest ifexists: delete tuning: default templates: - num: 1 file: ./content/logtest/logtest-rc.json parameters: - REPLICAS: "1" - INITIAL_FLAGS: "--num-lines 10000 --line-length 1024 --word-length 9 --rate 6000 --fixed-line\n" This will run the logging pod for about 2 minutes and produce 10000 lines of fixed content. Every line is the same with the exception of the sequence number 3. Go to the node and find the pod log in /var/log/containers and run wc -l Actual results: 10000 lines in pod log Expected results: > 10000 lines in pod log Additional info:
This could be considered high severity for applications which produce log lines requiring parsing by business applications.
Created attachment 1405046 [details] pod log showing line splits In the attached file, see lines 164 and 165 for the split happening Lines 645 and 646 show the split happening at the same place in the payload as line 164 Lines 837 and 838 have it happening in a different position in the payload
All of the split lines have the P annotation as designed. The log parser should look for P as a partial line and continue reading till it hits the next F in the output to complete the line.
Looks like this should be moved to Logging team according to Murnal's comment. Moving it, re-move otherwise
(In reply to Mrunal Patel from comment #3) > All of the split lines have the P annotation as designed. The log parser > should look for P as a partial line and continue reading till it hits the > next F in the output to complete the line. What is the default maximum size of a log entry that crio writes? Is this configurable? Note that this is a huge feature i.e. 3.11 at the earliest. We will need to figure out a way to mitigate this problem in the meantime.
Moving from POST back to NEW so this does not get lost.
@mrunal - What is the default maximum size of a log entry that crio writes? Is this configurable?
Rich, the line size today is 8kb. We are making some changes and might make it configurable. I will let you know if/when that goes in.
This gets to the broken foundations of byte capture console logging: the interpretation of the byte stream is performed too early. Mruanl, where is the proper upstream code base for CRI-O so that I might be able to make a suggested PR to address this?
@portante, cri-o upstream is here: https://github.com/kubernetes-incubator/cri-o Confirming that the issue still exists in OCP 3.10 with cri-o.x86_64 1.10.2-1.rhaos3.10.git095e88c.el7 Saw this while verifying https://bugzilla.redhat.com/show_bug.cgi?id=1554407 is no longer an issue.
Fixed in https://github.com/openshift/origin-aggregated-logging/pull/1386
Verified on OCP 3.11.31 + CRIO 1.11.7 but I hit https://bugzilla.redhat.com/show_bug.cgi?id=1642547 along the way. Holding off on final verification until that is fixed.
Docs in progress for https://bugzilla.redhat.com/show_bug.cgi?id=1642547 Marking this bz as verified on OCP 3.11.31 + CRIO 1.11.7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537
We will not backport this fix to 3.9