Description of problem: We do see log messages emitted from some pods that are being split into fragments (i.e. single log message broken into two log messages). When does the behavior occur? Frequency? Repeatedly? At certain times? We have not identified the exact circumstances required for this to occur. It happens for only a small number of the log messages generated in our cluster and seems somewhat random. It doesn't happen all of the time but it has happened repeatedly. This does not happen when the same pods (e.g. same image, same code, etc.) on other cloud providers (e.g. Azure, AWS, GCP, "vanilla" K8s running on OpenStack). In those other cloud environments, the container runtime has been either docker or containerd. This led me to look at the cri-o container runtime as the source. I believe we may be running into Issue #242 in the conmon GitHub repo (https://github.com/containers/conmon/issues/242). The symptoms are the same and, as I said, we haven't seen the same issue on clusters running other container runtimes. What information can you provide around timeframes and the business impact? While this problem doesn't prevent our application from working, it does cause problem for our monitoring process. We generate structured (JSON) log messages and parse those messages to help our users/administrators monitor the health of our application. This problem "breaks" that process: properly formatted JSON is broken into partial fragments that cannot be processed properly by the log monitoring components. Since, so far, the problem is infrequent and random, it isn't a show-stopper. However, it does mean there is the potential for some critical message (or messages) to be missed by administrators or the automated alerting they may have in place. Version-Release number of selected component (if applicable): 4.8 How reproducible: Explained in https://github.com/containers/conmon/issues/242 Actual results: single log message broken into two log messages Expected results: Single log message to continuos to be single. Additional info: https://github.com/containers/conmon/issues/242
So I am guessing you're reading the log files from disk in the CRI log format. I can imagine a reason CRI-O differs from other cloud providers because until recently other cloud providers use docker, as opposed to a CRI implementation. The CRI log format intentionally splits up log lines and does not buffer. In each log line, there is a character "P" (for partial) or "F" (for full) that signifies whether the line is complete or not. Any parser will need to interpret those characters to reconstruct the log line. I do not think this is the responsibility of CRI-O or conmon. The issue you've tagged is with the journald log driver, which is separate from the CRI log format. I don't believe we recommend anyone uses the journald log driver for openshift. Which log format is your tool parsing?
related: https://github.com/cri-o/cri-o/issues/5392 I don't believe we can do anything for this bug. It's up to the client to reconstruct the logs taking into account whether the line is partial or not. This is an important piece of the CRI log specification and not something CRi-O is responsible for working around. as such, I am closing this.