1301751 – Move all logging to stdout/err to allow systemd throttling logging of errors

RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/

Bug 1301751 - Move all logging to stdout/err to allow systemd throttling logging of errors

Summary: Move all logging to stdout/err to allow systemd throttling logging of errors

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	RDO
Classification:	Community
Component:	distribution
Sub Component:
Version:	trunk
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	Milestone3
Target Release:	trunk
Assignee:	Alan Pevec
QA Contact:	Shai Revivo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-25 22:11 UTC by Alan Pevec
Modified:	2018-11-21 00:37 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:	1294911
Environment:
Last Closed:	2018-11-21 00:37:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Alan Pevec 2016-01-25 22:11:32 UTC

+++ This bug was initially created as a clone of Bug #1294911 +++

Description of problem:

We think that nova-conductor.log, nova-api.log has no "backoff" for throttling logging of errors?

If the OpenStack cluster is stressed there are written errors every 1 ms in the nova-conductor.log, here is example:

...snip, see original BZ for ugly unreadable logs examples...

This ends up in "No space on disk left." and OpenStack environment is unusable.

Could you please check if there is any implementation for throttling error logs?


--- Additional comment from Lars Kellogg-Stedman on 2016-01-19 16:55:54 EST ---

What if we were to (a) stop logging to syslog, (b) stop logging to files, and (c) just log to stdout/stderr (so all log messages would be handled by journald, and could eventually end up in syslog anyway).

Then we could take advantage of the rate limiting support in journald:

RateLimitInterval=, RateLimitBurst=

    Configures the rate limiting that is applied to all messages generated on the system. If, in the time interval defined by RateLimitInterval=, more messages than specified in RateLimitBurst= are logged by a service, all further messages within the interval are dropped until the interval is over. A message about the number of dropped messages is generated. This rate limiting is applied per-service, so that two services which log do not interfere with each other's limits. Defaults to 1000 messages in 30s. The time specification for RateLimitInterval= may be specified in the following units: "s", "min", "h", "ms", "us". To turn off any kind of rate limiting, set either value to 0.

Comment 2 Christopher Brown 2017-06-17 18:51:03 UTC

This is something I might be interested in picking up but my guess is that it needs _much_ wider input as it will be big change for lots of people.

Oslo supports systemd logging integration:

https://docs.openstack.org/developer/oslo.log/journal.html

There's also an argument that now we have integrated avail and perf monitoring, operators have no excuse not to monitor disk space etc etc.

Comments?

Comment 3 Victor Stinner 2018-06-20 13:15:42 UTC

> We think that nova-conductor.log, nova-api.log has no "backoff" for throttling logging of errors?

I implemented rate limiting in Oslo Log for bz#1294911:

https://docs.openstack.org/oslo.log/latest/configuration/index.html#DEFAULT.rate_limit_interval

rate_limit_interval
    Type:	integer
    Default:	0

    Interval, number of seconds, of log rate limiting.

rate_limit_burst
    Type:	integer
    Default:	0

    Maximum number of logged messages per rate_limit_interval.

rate_limit_except_level
    Type:	string
    Default:	CRITICAL

    Log level name used by rate limiting: CRITICAL, ERROR, INFO, WARNING, DEBUG or empty string. Logs with level greater or equal to rate_limit_except_level are not filtered. An empty string means that all levels are filtered.


Upstream issue (merged 1 year 9 months ago): https://review.openstack.org/#/c/322263/


Rate limiting means dropping logs which has an impact on security and debugging, so it's disabled by default.

Comment 4 Alan Pevec 2018-11-21 00:37:17 UTC

With the move to containerized deployments, logging is under container management control and cannot be solved in packaging so further enhancements should be in the deployment framework, TripleO.

Note You need to log in before you can comment on or make changes to this bug.