Bug 1996599 - Extra newlines in logged container output #242
Summary: Extra newlines in logged container output #242
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Peter Hunt
QA Contact: pmali
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-23 09:40 UTC by David Hernández Fernández
Modified: 2021-10-13 19:33 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-13 19:33:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github containers conmon issues 242 0 None None None 2021-08-23 09:41:16 UTC

Internal Links: 2108228

Description David Hernández Fernández 2021-08-23 09:40:40 UTC
Description of problem: We do see log messages emitted from some pods  that are being split into fragments (i.e. single log message broken into two log messages).  

When does the behavior occur? Frequency? Repeatedly? At certain times?
We have not identified the exact circumstances required for this to occur.  It happens for only a small number of the log messages generated in our cluster and seems somewhat random.

It doesn't happen all of the time but it has happened repeatedly.  This does not happen when the same pods (e.g. same image, same code, etc.) on other cloud providers (e.g. Azure, AWS, GCP, "vanilla" K8s running on OpenStack).  In those other cloud environments, the container runtime has been either docker or containerd.  This led me to look at the cri-o container runtime as the source.  I believe we may be running into Issue #242 in the conmon GitHub repo (https://github.com/containers/conmon/issues/242).  The symptoms are the same and, as I said, we haven't seen the same issue on clusters running other container runtimes.

What information can you provide around timeframes and the business impact?
While this problem doesn't prevent our application from working, it does cause problem for our monitoring process.  We generate structured (JSON) log messages and parse those messages to help our users/administrators monitor the health of our application.  This problem "breaks" that process: properly formatted JSON is broken into partial fragments that cannot be processed properly by the log monitoring components.  Since, so far, the problem is infrequent and random, it isn't a show-stopper.  However, it does mean there is the potential for some critical message (or messages) to be missed by administrators or the automated alerting they may have in place.


Version-Release number of selected component (if applicable): 4.8

How reproducible: Explained in https://github.com/containers/conmon/issues/242 


Actual results: single log message broken into two log messages


Expected results: Single log message to continuos to be single.


Additional info: https://github.com/containers/conmon/issues/242

Comment 1 Peter Hunt 2021-09-09 15:30:57 UTC
So I am guessing you're reading the log files from disk in the CRI log format.  I can imagine a reason CRI-O differs from other cloud providers because until recently other cloud providers use docker, as opposed to a CRI implementation. The CRI log format intentionally splits up log lines and does not buffer. In each log line, there is a character "P" (for partial) or "F" (for full) that signifies whether the line is complete or not. Any parser will need to interpret those characters to reconstruct the log line. I do not think this is the responsibility of CRI-O or conmon.

The issue you've tagged is with the journald log driver, which is separate from the CRI log format. I don't believe we recommend anyone uses the journald log driver for openshift.

Which log format is your tool parsing?

Comment 6 Peter Hunt 2021-10-13 19:33:57 UTC
related: https://github.com/cri-o/cri-o/issues/5392

I don't believe we can do anything for this bug. It's up to the client to reconstruct the logs taking into account whether the line is partial or not. This is an important piece of the CRI log specification and not something CRi-O is responsible for working around. 

as such, I am closing this.


Note You need to log in before you can comment on or make changes to this bug.