Bug 1301751 - Move all logging to stdout/err to allow systemd throttling logging of errors [NEEDINFO]
Move all logging to stdout/err to allow systemd throttling logging of errors
Status: NEW
Product: RDO
Classification: Community
Component: distribution (Show other bugs)
trunk
Unspecified Linux
high Severity high
: Milestone3
: trunk
Assigned To: Lars Kellogg-Stedman
Shai Revivo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-25 17:11 EST by Alan Pevec
Modified: 2017-06-17 14:51 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1294911
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
chris.brown: needinfo? (apevec)


Attachments (Terms of Use)

  None (edit)
Description Alan Pevec 2016-01-25 17:11:32 EST
+++ This bug was initially created as a clone of Bug #1294911 +++

Description of problem:

We think that nova-conductor.log, nova-api.log has no "backoff" for throttling logging of errors?

If the OpenStack cluster is stressed there are written errors every 1 ms in the nova-conductor.log, here is example:

...snip, see original BZ for ugly unreadable logs examples...

This ends up in "No space on disk left." and OpenStack environment is unusable.

Could you please check if there is any implementation for throttling error logs?


--- Additional comment from Lars Kellogg-Stedman on 2016-01-19 16:55:54 EST ---

What if we were to (a) stop logging to syslog, (b) stop logging to files, and (c) just log to stdout/stderr (so all log messages would be handled by journald, and could eventually end up in syslog anyway).

Then we could take advantage of the rate limiting support in journald:

RateLimitInterval=, RateLimitBurst=

    Configures the rate limiting that is applied to all messages generated on the system. If, in the time interval defined by RateLimitInterval=, more messages than specified in RateLimitBurst= are logged by a service, all further messages within the interval are dropped until the interval is over. A message about the number of dropped messages is generated. This rate limiting is applied per-service, so that two services which log do not interfere with each other's limits. Defaults to 1000 messages in 30s. The time specification for RateLimitInterval= may be specified in the following units: "s", "min", "h", "ms", "us". To turn off any kind of rate limiting, set either value to 0.
Comment 2 Christopher Brown 2017-06-17 14:51:03 EDT
This is something I might be interested in picking up but my guess is that it needs _much_ wider input as it will be big change for lots of people.

Oslo supports systemd logging integration:

https://docs.openstack.org/developer/oslo.log/journal.html

There's also an argument that now we have integrated avail and perf monitoring, operators have no excuse not to monitor disk space etc etc.

Comments?

Note You need to log in before you can comment on or make changes to this bug.