Bug 1741955
| Summary: | Increase OOTB kubeletConfig containerLogMaxSize | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.2.0 | CC: | aos-bugs, jokerman, rmeggins |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:36:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Mike Fiedler
2019-08-16 14:11:08 UTC
I suspect that increasing the size to 50Mi only delays the issue. It is not a fix. We need to figure out the root cause of the dropped messages on the rotation boundary. Deferring to 4.3. There is more at play here that just bumping the tunable to cover up symptoms. Need to work with the logging team in 4.3 to figure out the right solution. Since any 4.1/4.2 customer that encounters this can change the tunable on their own, we have a workaround. Changing this tunable potentially increases disk usage due to logs by 5x. For 100 pods on a node, that is potentially 5Gi vs 1Gi now. (In reply to Seth Jennings from comment #2) > Deferring to 4.3. There is more at play here that just bumping the tunable > to cover up symptoms. Need to work with the logging team in 4.3 to figure > out the right solution. The 10MB limit - is this the same limit that was used in OCP 3.x? If not, why was it changed? If it is the same, then I'm not sure why we are seeing problems with EFK logging in 4.x that we did not see in 3.x, unless it also has something to do with cri-o logging and the cri-o log file format, which is different than docker json-file. What is the number 10MB based on? Is it a number that was designed to work with log scrapers such as fluentd, rsyslog, loki promtail? Or is it designed to optimize the disk space for log files? > > Since any 4.1/4.2 customer that encounters this can change the tunable on > their own, we have a workaround. > > Changing this tunable potentially increases disk usage due to logs by 5x. > For 100 pods on a node, that is potentially 5Gi vs 1Gi now. CRI Log Rotation backstory PR in 1.10 https://github.com/kubernetes/kubernetes/pull/59898 Backstory: https://github.com/kubernetes/kubernetes/issues/58823 https://github.com/kubernetes/enhancements/issues/552 https://docs.google.com/document/d/1oQe8dFiLln7cGyrRdholMsgogliOtpAzq6-K3068Ncg/edit# CRIContainerLogRotation feature gate alpha (disabled by default) in 1.10 beta (enabled by default) in 1.11 kubelet flags: --container-log-max-size (default 10Mi) --container-log-max-files (default 5) default size probably cribbed from json-file example in docker documentation https://docs.docker.com/config/containers/logging/json-file/ we started setting log-opts max-size=50m for docker in 3.11 https://github.com/openshift/openshift-ansible/commit/5e57addcb1bc88d36015e6f06c209985d1e0dbc7 that might be justification for changing our default to 50Mi in 4.x Can we target this bug for 4.2.0? Done. Retargeted for 4.2. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |