+++ This bug was initially created as a clone of Bug #1294911 +++ Description of problem: We think that nova-conductor.log, nova-api.log has no "backoff" for throttling logging of errors? If the OpenStack cluster is stressed there are written errors every 1 ms in the nova-conductor.log, here is example: ...snip, see original BZ for ugly unreadable logs examples... This ends up in "No space on disk left." and OpenStack environment is unusable. Could you please check if there is any implementation for throttling error logs? --- Additional comment from Lars Kellogg-Stedman on 2016-01-19 16:55:54 EST --- What if we were to (a) stop logging to syslog, (b) stop logging to files, and (c) just log to stdout/stderr (so all log messages would be handled by journald, and could eventually end up in syslog anyway). Then we could take advantage of the rate limiting support in journald: RateLimitInterval=, RateLimitBurst= Configures the rate limiting that is applied to all messages generated on the system. If, in the time interval defined by RateLimitInterval=, more messages than specified in RateLimitBurst= are logged by a service, all further messages within the interval are dropped until the interval is over. A message about the number of dropped messages is generated. This rate limiting is applied per-service, so that two services which log do not interfere with each other's limits. Defaults to 1000 messages in 30s. The time specification for RateLimitInterval= may be specified in the following units: "s", "min", "h", "ms", "us". To turn off any kind of rate limiting, set either value to 0.
This is something I might be interested in picking up but my guess is that it needs _much_ wider input as it will be big change for lots of people. Oslo supports systemd logging integration: https://docs.openstack.org/developer/oslo.log/journal.html There's also an argument that now we have integrated avail and perf monitoring, operators have no excuse not to monitor disk space etc etc. Comments?
> We think that nova-conductor.log, nova-api.log has no "backoff" for throttling logging of errors? I implemented rate limiting in Oslo Log for bz#1294911: https://docs.openstack.org/oslo.log/latest/configuration/index.html#DEFAULT.rate_limit_interval rate_limit_interval Type: integer Default: 0 Interval, number of seconds, of log rate limiting. rate_limit_burst Type: integer Default: 0 Maximum number of logged messages per rate_limit_interval. rate_limit_except_level Type: string Default: CRITICAL Log level name used by rate limiting: CRITICAL, ERROR, INFO, WARNING, DEBUG or empty string. Logs with level greater or equal to rate_limit_except_level are not filtered. An empty string means that all levels are filtered. Upstream issue (merged 1 year 9 months ago): https://review.openstack.org/#/c/322263/ Rate limiting means dropping logs which has an impact on security and debugging, so it's disabled by default.
With the move to containerized deployments, logging is under container management control and cannot be solved in packaging so further enhancements should be in the deployment framework, TripleO.