Bug 1303035

Summary: Backport error_logger improvements related to output throttling
Product: Red Hat OpenStack Reporter: Peter Lemenkov <plemenko>
Component: erlangAssignee: Peter Lemenkov <plemenko>
Status: CLOSED NEXTRELEASE QA Contact: yeylon <yeylon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: apevec, dcadzow, jeckersb, lhh, srevivo, yeylon
Target Milestone: ---Keywords: FeatureBackport
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: erlang-R16B-03.10min.6.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-04 13:11:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Lemenkov 2016-01-29 11:41:45 UTC
Erlang uses a generic error_logger library for logging. Currently only files and tty output is available (no syslog or Journald support so far).

Usually it looks something like that:

error_logger:error_msg("ERROR. Current State is: ~p~n", [ComplexErrorStructureToDump]).

ComplexErrorStructureToDump is an arbitrary object to print. "ERROR. Current State is: ~p~n" - is a format string. This string is almost identical to one using in a generic ANSI C printf implementations, except one interesting modifier - "~p" (sometimes "~w" which is almost the same in the context of this ticket). It tells Erlang VM that we don't know anything about the exact ComplexErrorStructureToDump structure, and we ask Erlang VM to decode and print ComplexErrorStructureToDump according to its actual structure.

There is one major issue in the example above. If ComplexErrorStructureToDump contains only bytes, is a string or an array of integers, then it's easy to print it. On the contrary if it has a complex structure, then it takes a lot of time to recognize it, create a corresponding format structure (and substitute "~p" with it in runtime), and format ComplexErrorStructureToDump according to this format structure.

Unfortunately this latter situation is more common than we want (mostly due to dynamically typed nature if the language, but that's another heated and political debate more suited for Slashdot / Reddit). For example if a critical error occurs in one of the Erlang threads, then the entire call trace, containing the full message flow led to this state is sent into error_logger for further printing. if it happens not so frequent it's not an issue, but if a situation continues to deteriorate, then error_logger got filled with messages waiting for printing.

In our case we see cases where RabbitMQ ate ~32 GByte of memory from which more than 31 goes to error_logger.

A proposed experimental solution is to limit a depth of a recursive parsing while recognizing an object's structure with a configuration switch "~p" or "~w":

http://erlang.org/doc/man/kernel_app.html#error_logger_format_depth

This experimental feature is available since Erlang 18.1 via the following commits:

* https://github.com/erlang/otp/commit/3c3a5b08589a6e54b0d2d81e470fcffaaa74c15c
* https://github.com/erlang/otp/commit/7ef23e3122467434300f340ef149f69113b617b4

In order to enable support for this feature one has to add the following kernel parameter to the application config-file (/etc/rabbitmq/rabbitmq.config in case of RabbitMQ):

...
{kernel,
  [{error_logger_format_depth, 11}, ...]
}
...

Value can't be lower than "10", and default value is "unlimited" (w/o quotes).

Comment 2 Peter Lemenkov 2016-01-29 11:57:02 UTC
Another one patch which was backported:

* https://github.com/erlang/otp/commit/7c0333ff7a559c2d0f138b3156c4a5b73e15051b