Hide Forgot
Description of problem: For https://bugzilla.redhat.com/show_bug.cgi?id=1380455 we did not catch the critical messages which were being logged at the time the master controller failed over - don't know if there was a panic, systemd activity ,etc. All that was in the log was: Sep 28 18:01:23 svt-m-1 atomic-openshift-master-controllers: I0928 18:01:23.450298 74228 priorities.go:39] Combined requested resources 2000 from existing pods exceeds capacity 1000 on node 192.1.2.45 Sep 28 18:01:23 svt-m-1 rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting Sep 28 18:08:52 svt-m-1 rsyslogd-2177: imjournal: 17351 messages lost due to rate-limiting Sep 28 18:08:52 svt-m-1 atomic-openshift-node: I0928 18:08:52.873723 46320 roundrobin.go:273] LoadBalancerRR: Setting endpoints for default/kubernetes:https to [192.1.0.49:8443 192.1.0.51:8443] The failover happened at ~18:02 We should set a wider window/less aggressive throttle for rsyslogd on masters/etcd/load balancers Version-Release number of selected component (if applicable): 3.30.32 How reproducible: Always, if message bursts are high enough
Peter, is this an rsyslog or journald configuration issue?
PSA: This has happened to us on a number of occasions and is due to two parts. 1. We still have hot-loop logging in some places. 2. Throttles on logs need to be opened up.
@Rich, this is specifically an rsyslog default ratelimit setting. We need to keep in mind that systemd rate-limits per-service, while rsyslog rate-limits all logs read from the journal. So we need to bump up the imjournal settings, so that the aggregate rate that all services can log at via the journal is handled. See ratelimit.interval, ratelimit.burst at http://www.rsyslog.com/doc/v7-stable/configuration/modules/imjournal.html Note that the historical reason for this was due to corrupted systemd journals causing rsyslog to see unlimited messages that were not there. However, if one service is doing most of the messaging we still need to be sure we up the systemd configuration for journald. That said, hopefully systemd is configured with persistent logs, meaning /var/log/journal directory is present, otherwise, under such a store, it might be easy to hit the 4GB in memory limit and the logs get logs. Last I heard, OpenShift uses persistent logging so this might not be an issue. If you need the proper settings, I can help track down the information and add it here.
xref: https://github.com/kubernetes/kubernetes/issues/33935
Yeah, lets get the suggested settings. RHEL7 by default doesn't persist the journal and we're not doing anything with that.
Who can provide suggested settings? Rich or Peter, have any resources? I know we can disable the throttling with $SystemLogRateLimitInterval 0 $SystemLogRateLimitBurst 0 But that seems like it might not be the right answer.
I (originator of this bz) am going to backtrack on this a bit. I, as system admin, would not expect or want the OpenShift installer to modify any customization I've made in this area and certainly would not want throttling disabled unless I've done it myself. I think this should be a documentation update or knowledge base article on OpenShift trouble shooting. I am happy to help provide content.
Moving to documentation component based on comment 15.
(In reply to Mike Fiedler from comment #15) > I (originator of this bz) am going to backtrack on this a bit. I, as > system admin, would not expect or want the OpenShift installer to modify any > customization I've made in this area and certainly would not want throttling > disabled unless I've done it myself. > > I think this should be a documentation update or knowledge base article on > OpenShift trouble shooting. I am happy to help provide content. Mike, are you able to push a PR [1] and tag me? I can then get someone from the docs team to clean it up and find the right place for it. If not a PR, some text to me via email would work as well. [1] https://github.com/openshift/openshift-docs
I should be able to do that - likely next week. If you prefer, you can assign the bz to me until I have a PR for you.
Mike - did you create a PR?
The more I think about this one, the more I believe it is normal system admin activity - documenting the Linux config for rsyslog/journald in OpenShift documentation feels wrong. If no one objects, I am going to close this one out. If you feel strongly otherwise, please re-open.