1996599 – Extra newlines in logged container output #242

Bug 1996599 - Extra newlines in logged container output #242

Summary: Extra newlines in logged container output #242

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.8
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Peter Hunt
QA Contact:	pmali
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-23 09:40 UTC by David Hernández Fernández
Modified:	2024-12-20 20:47 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-13 19:33:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	containers conmon issues 242	0	None	None	None	2021-08-23 09:41:16 UTC

Internal Links: 2108228

Description David Hernández Fernández 2021-08-23 09:40:40 UTC

Description of problem: We do see log messages emitted from some pods that are being split into fragments (i.e. single log message broken into two log messages).

When does the behavior occur? Frequency? Repeatedly? At certain times?
We have not identified the exact circumstances required for this to occur. It happens for only a small number of the log messages generated in our cluster and seems somewhat random.

It doesn't happen all of the time but it has happened repeatedly. This does not happen when the same pods (e.g. same image, same code, etc.) on other cloud providers (e.g. Azure, AWS, GCP, "vanilla" K8s running on OpenStack). In those other cloud environments, the container runtime has been either docker or containerd. This led me to look at the cri-o container runtime as the source. I believe we may be running into Issue #242 in the conmon GitHub repo (https://github.com/containers/conmon/issues/242). The symptoms are the same and, as I said, we haven't seen the same issue on clusters running other container runtimes.

What information can you provide around timeframes and the business impact?
While this problem doesn't prevent our application from working, it does cause problem for our monitoring process. We generate structured (JSON) log messages and parse those messages to help our users/administrators monitor the health of our application. This problem "breaks" that process: properly formatted JSON is broken into partial fragments that cannot be processed properly by the log monitoring components. Since, so far, the problem is infrequent and random, it isn't a show-stopper. However, it does mean there is the potential for some critical message (or messages) to be missed by administrators or the automated alerting they may have in place.

Version-Release number of selected component (if applicable): 4.8

How reproducible: Explained in https://github.com/containers/conmon/issues/242

Actual results: single log message broken into two log messages

Expected results: Single log message to continuos to be single.

Additional info: https://github.com/containers/conmon/issues/242

Comment 1 Peter Hunt 2021-09-09 15:30:57 UTC

So I am guessing you're reading the log files from disk in the CRI log format.  I can imagine a reason CRI-O differs from other cloud providers because until recently other cloud providers use docker, as opposed to a CRI implementation. The CRI log format intentionally splits up log lines and does not buffer. In each log line, there is a character "P" (for partial) or "F" (for full) that signifies whether the line is complete or not. Any parser will need to interpret those characters to reconstruct the log line. I do not think this is the responsibility of CRI-O or conmon.

The issue you've tagged is with the journald log driver, which is separate from the CRI log format. I don't believe we recommend anyone uses the journald log driver for openshift.

Which log format is your tool parsing?

Comment 6 Peter Hunt 2021-10-13 19:33:57 UTC

related: https://github.com/cri-o/cri-o/issues/5392

I don't believe we can do anything for this bug. It's up to the client to reconstruct the logs taking into account whether the line is partial or not. This is an important piece of the CRI log specification and not something CRi-O is responsible for working around. 

as such, I am closing this.

Note You need to log in before you can comment on or make changes to this bug.