Bug 1469859 - Starting 30 fluentd pods with mux service enabled and 3 logging-mux pods running pegs logging-mux pod CPU
Summary: Starting 30 fluentd pods with mux service enabled and 3 logging-mux pods runn...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.6.z
Assignee: Rich Megginson
QA Contact: Mike Fiedler
URL:
Whiteboard: aos-scalability-36
Depends On: 1482002 1498999
Blocks: 1413147 1534588
TreeView+ depends on / blocked
 
Reported: 2017-07-12 02:36 UTC by Mike Fiedler
Modified: 2018-01-15 14:14 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Technology Preview
Doc Text:
Package(s) providing the Technology Preview: OpenShift logging Description of the Technology Preview: Using Fluentd as a component for aggregating logs from Fluentd node agents, called "mux", is a Technology Preview for 3.6.0.
Clone Of:
Environment:
Last Closed: 2017-10-25 13:02:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logging-mux logs during high cpu episode (11.15 KB, application/x-gzip)
2017-07-12 02:36 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Description Mike Fiedler 2017-07-12 02:36:09 UTC
Created attachment 1296662 [details]
logging-mux logs during high cpu episode

Description of problem:

On the scalability cluster with 3 logging-mux pods running, starting 30 fluentd pods by labeling nodes with a 0.5 second delay pegs the logging-mux pod CPU at 100% of 1 core and it never recovers until the fluentd pods are stopped.

Illustration of the issue:  http://pbench.perf.lab.eng.bos.redhat.com/results/EC2::svt-j-1/20170711_3mux_30_fluentd_idle_a/tools-default/svt_infra_4:172.16.0.5/pidstat/cpu_usage.html

The logging-mux-fluentd jumps from 0% utilization to a full core and never abates.

Environment overview:

3 masters/ 3 etcd
3 infra nodes (40 vCPU/110 GB RAM) where all logging components are running
3 node ES cluster size (3 deployment ES configs).  1 ES per infra node
3 logging-mux replicas.  1 logging-mux per infra node
30 fluentd pods running (10 per logging-mux).   Running on compute nodes with 4 vCPU and 8GB memory.
There are a total of 460 nodes in the cluster, but only 30 are running fluentd in this scenario.

ES deployment config memory limit raised to 31 GB and cpu limit removed
logging-mux memory and cpu limits removed
logging-fluentd memory and cpu limits removed
no logging activity apart from ops logging on the logging-fluentd nodes

logging-mux and logging-fluentd are running the v3.6.140-3 fluentd image:

<brew-pulp internal hostname>:8888/openshift3/logging-fluentd    v3.6.140-3          40f006cee71d        2 hours ago         232.2 MB


Version-Release number of selected component (if applicable): 3.6.140-3 fluentd image


How reproducible: Always when starting the fluentd pods by labelling nodes with a 0.5 second sleep


Additional info:

pidstat for logging-mux fluentd when the logging-fluentd pods start:

09:35:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
09:38:32 PM     0    120500    0.10    0.00    0.00    0.10    33  fluentd 
09:38:42 PM     0    120500    0.00    0.00    0.00    0.00    33  fluentd 
09:38:52 PM     0    120500   77.50    1.40    0.00   78.90     4  fluentd 
09:39:02 PM     0    120500   97.70    2.20    0.00   99.90     0  fluentd 
09:39:12 PM     0    120500   98.00    1.70    0.00   99.70     4  fluentd 
09:39:22 PM     0    120500   98.30    1.70    0.00  100.00    34  fluentd 
09:39:32 PM     0    120500   96.10    3.80    0.00   99.90     6  fluentd 
09:39:42 PM     0    120500   94.30    5.30    0.00   99.60     3  fluentd 
09:39:52 PM     0    120500   95.10    4.40    0.00   99.50    24  fluentd 
09:40:02 PM     0    120500   96.80    3.00    0.00   99.80     8  fluentd 
09:40:12 PM     0    120500   97.50    1.90    0.00   99.40     2  fluentd 
09:40:22 PM     0    120500   93.40    5.40    0.00   98.80    17  fluentd 
09:40:32 PM     0    120500   95.00    4.70    0.00   99.70    26  fluentd 
09:40:42 PM     0    120500   97.90    1.50    0.00   99.40     2  fluentd 


logging-fluentd logs show repeated slow flush messages (logs attached)

2017-07-11 21:51:34 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=21.516913934 slow_flush_log_threshold=20.0
2017-07-11 21:52:16 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.114825274 slow_flush_log_threshold=20.0
2017-07-11 21:52:38 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.525696254 slow_flush_log_threshold=20.0
2017-07-11 21:53:20 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.196766113 slow_flush_log_threshold=20.0
2017-07-11 21:54:02 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.765578436 slow_flush_log_threshold=20.0
2017-07-11 22:01:35 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.002420649 slow_flush_log_threshold=20.0

Comment 1 Mike Fiedler 2017-07-12 02:44:05 UTC
The logging-tmux deployment config and logging-fluentd daemonset were both updated to use the v3.6.140-3 container images

Comment 2 Mike Fiedler 2017-07-12 02:54:42 UTC
The logging-tmux deployment config and logging-fluentd daemonset were both updated to use the v3.6.140-3 container images

Comment 3 Peter Portante 2017-07-12 12:35:48 UTC
This appears to be a bug in fluentd, or one of its plugins, where something triggers a spin loop moving the process to use 100% of the CPU continually, no matter what load or connection rate is placed on it.  No logs from fluentd indicate any problems.  Further debugging is needed.

Comment 4 Rich Megginson 2017-07-12 14:51:36 UTC
This looks like the same thing Shirly is seeing in the RHV lab

Comment 7 Mike Fiedler 2017-07-12 22:12:33 UTC
The 232 MB in #c0 is the docker image size - not a runtime value.

For CPU, fluentd is limited to 1 core on both the mux and the collector - fluentd can't use more than a core without the multiprocess plugin.

So, the only actual "unlimited" resource would be memory.

Comment 12 Peter Portante 2017-07-18 00:40:23 UTC
It seems we can move most of the log processing from the mux size to the individual fluentd collectors to reduce the CPU load.  Then the mux will only receive, enhance with kube metadata, and send to ES.

That might be a simple change we can make easily, and verify rather quickly.

Comment 13 Rich Megginson 2017-08-18 18:05:24 UTC
Fixed by

https://github.com/openshift/openshift-ansible/commit/a6ed38676ad48a35911d449682cf2120016335c8

and

https://github.com/openshift/origin-aggregated-logging/commit/d0c4f0817d192f9bd503cd83be2aa9bf22c5aee0

When deploying logging to use mux, configure ansible like this:

openshift_logging_use_mux=True
openshift_logging_mux_client_mode=maximal

This will configure the fluentd running on each node to do as much of the processing as possible, except for the k8s metadata processing, which will be done by mux.  This should greatly offload the cpu usage from mux on to the fluentd running on each node.

These fixes should already be in the latest openshift-ansible and logging image 3.6.1 candidates.

Comment 14 Mike Fiedler 2017-09-27 12:50:50 UTC
Verified on 3.7.0-0.126.4.   Single logging-mux pod scalability now depends on message rates.   When scale limits of logging-mux are reached, it exhibits high cpu utilization but no longer throws errors or suffers from message loss.

Comment 17 errata-xmlrpc 2017-10-25 13:02:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.