Bug 1469859

Summary:

Starting 30 fluentd pods with mux service enabled and 3 logging-mux pods running pegs logging-mux pod CPU

Product:

OpenShift Container Platform

Reporter:

Mike Fiedler <mifiedle>

Component:

Logging

Assignee:

Rich Megginson <rmeggins>

Status:

CLOSED ERRATA

QA Contact:

Mike Fiedler <mifiedle>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

3.6.0

CC:

aos-bugs, bmcelvee, jcantril, nhosoi, pportant, rmeggins, sradco, vlaad, wabouham, xtian

Target Milestone:

---

Keywords:

TestBlocker

Target Release:

3.6.z

Hardware:

x86_64

OS:

Linux

Whiteboard:

aos-scalability-36

Fixed In Version:

Doc Type:

Technology Preview

Doc Text:

Package(s) providing the Technology Preview: OpenShift logging Description of the Technology Preview: Using Fluentd as a component for aggregating logs from Fluentd node agents, called "mux", is a Technology Preview for 3.6.0.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-10-25 13:02:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1482002, 1498999

Bug Blocks:

1413147, 1534588

Attachments:

Description	Flags
logging-mux logs during high cpu episode	none

Description Mike Fiedler 2017-07-12 02:36:09 UTC

Created attachment 1296662 [details]
logging-mux logs during high cpu episode

Description of problem:

On the scalability cluster with 3 logging-mux pods running, starting 30 fluentd pods by labeling nodes with a 0.5 second delay pegs the logging-mux pod CPU at 100% of 1 core and it never recovers until the fluentd pods are stopped.

Illustration of the issue:  http://pbench.perf.lab.eng.bos.redhat.com/results/EC2::svt-j-1/20170711_3mux_30_fluentd_idle_a/tools-default/svt_infra_4:172.16.0.5/pidstat/cpu_usage.html

The logging-mux-fluentd jumps from 0% utilization to a full core and never abates.

Environment overview:

3 masters/ 3 etcd
3 infra nodes (40 vCPU/110 GB RAM) where all logging components are running
3 node ES cluster size (3 deployment ES configs).  1 ES per infra node
3 logging-mux replicas.  1 logging-mux per infra node
30 fluentd pods running (10 per logging-mux).   Running on compute nodes with 4 vCPU and 8GB memory.
There are a total of 460 nodes in the cluster, but only 30 are running fluentd in this scenario.

ES deployment config memory limit raised to 31 GB and cpu limit removed
logging-mux memory and cpu limits removed
logging-fluentd memory and cpu limits removed
no logging activity apart from ops logging on the logging-fluentd nodes

logging-mux and logging-fluentd are running the v3.6.140-3 fluentd image:

<brew-pulp internal hostname>:8888/openshift3/logging-fluentd    v3.6.140-3          40f006cee71d        2 hours ago         232.2 MB


Version-Release number of selected component (if applicable): 3.6.140-3 fluentd image


How reproducible: Always when starting the fluentd pods by labelling nodes with a 0.5 second sleep


Additional info:

pidstat for logging-mux fluentd when the logging-fluentd pods start:

09:35:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
09:38:32 PM     0    120500    0.10    0.00    0.00    0.10    33  fluentd 
09:38:42 PM     0    120500    0.00    0.00    0.00    0.00    33  fluentd 
09:38:52 PM     0    120500   77.50    1.40    0.00   78.90     4  fluentd 
09:39:02 PM     0    120500   97.70    2.20    0.00   99.90     0  fluentd 
09:39:12 PM     0    120500   98.00    1.70    0.00   99.70     4  fluentd 
09:39:22 PM     0    120500   98.30    1.70    0.00  100.00    34  fluentd 
09:39:32 PM     0    120500   96.10    3.80    0.00   99.90     6  fluentd 
09:39:42 PM     0    120500   94.30    5.30    0.00   99.60     3  fluentd 
09:39:52 PM     0    120500   95.10    4.40    0.00   99.50    24  fluentd 
09:40:02 PM     0    120500   96.80    3.00    0.00   99.80     8  fluentd 
09:40:12 PM     0    120500   97.50    1.90    0.00   99.40     2  fluentd 
09:40:22 PM     0    120500   93.40    5.40    0.00   98.80    17  fluentd 
09:40:32 PM     0    120500   95.00    4.70    0.00   99.70    26  fluentd 
09:40:42 PM     0    120500   97.90    1.50    0.00   99.40     2  fluentd 


logging-fluentd logs show repeated slow flush messages (logs attached)

2017-07-11 21:51:34 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=21.516913934 slow_flush_log_threshold=20.0
2017-07-11 21:52:16 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.114825274 slow_flush_log_threshold=20.0
2017-07-11 21:52:38 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.525696254 slow_flush_log_threshold=20.0
2017-07-11 21:53:20 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.196766113 slow_flush_log_threshold=20.0
2017-07-11 21:54:02 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.765578436 slow_flush_log_threshold=20.0
2017-07-11 22:01:35 -0400 [warn]: buffer flush took longer time than slow_flush_log_threshold: plugin_id="object:1ab62b4" elapsed_time=20.002420649 slow_flush_log_threshold=20.0

Comment 1 Mike Fiedler 2017-07-12 02:44:05 UTC

The logging-tmux deployment config and logging-fluentd daemonset were both updated to use the v3.6.140-3 container images

Comment 2 Mike Fiedler 2017-07-12 02:54:42 UTC

The logging-tmux deployment config and logging-fluentd daemonset were both updated to use the v3.6.140-3 container images

Comment 3 Peter Portante 2017-07-12 12:35:48 UTC

This appears to be a bug in fluentd, or one of its plugins, where something triggers a spin loop moving the process to use 100% of the CPU continually, no matter what load or connection rate is placed on it.  No logs from fluentd indicate any problems.  Further debugging is needed.

Comment 4 Rich Megginson 2017-07-12 14:51:36 UTC

This looks like the same thing Shirly is seeing in the RHV lab

Comment 7 Mike Fiedler 2017-07-12 22:12:33 UTC

The 232 MB in #c0 is the docker image size - not a runtime value.

For CPU, fluentd is limited to 1 core on both the mux and the collector - fluentd can't use more than a core without the multiprocess plugin.

So, the only actual "unlimited" resource would be memory.

Comment 12 Peter Portante 2017-07-18 00:40:23 UTC

It seems we can move most of the log processing from the mux size to the individual fluentd collectors to reduce the CPU load.  Then the mux will only receive, enhance with kube metadata, and send to ES.

That might be a simple change we can make easily, and verify rather quickly.

Comment 13 Rich Megginson 2017-08-18 18:05:24 UTC

Fixed by

https://github.com/openshift/openshift-ansible/commit/a6ed38676ad48a35911d449682cf2120016335c8

and

https://github.com/openshift/origin-aggregated-logging/commit/d0c4f0817d192f9bd503cd83be2aa9bf22c5aee0

When deploying logging to use mux, configure ansible like this:

openshift_logging_use_mux=True
openshift_logging_mux_client_mode=maximal

This will configure the fluentd running on each node to do as much of the processing as possible, except for the k8s metadata processing, which will be done by mux.  This should greatly offload the cpu usage from mux on to the fluentd running on each node.

These fixes should already be in the latest openshift-ansible and logging image 3.6.1 candidates.

Comment 14 Mike Fiedler 2017-09-27 12:50:50 UTC

Verified on 3.7.0-0.126.4.   Single logging-mux pod scalability now depends on message rates.   When scale limits of logging-mux are reached, it exhibits high cpu utilization but no longer throws errors or suffers from message loss.

Comment 17 errata-xmlrpc 2017-10-25 13:02:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049