1432389 – The log throttling settings didn't effect

Bug 1432389 - The log throttling settings didn't effect

Summary: The log throttling settings didn't effect

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jan Wozniak
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-15 09:53 UTC by Xia Zhao
Modified:	2017-08-16 19:51 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The script to evaluate the log settings was ignoring the throttling file. Consequence: No throttling was configured Fix: Modify the throttling script to process the throttling file Result: Users configuring docker with the json-file log driver should be able to throttle projects
Clone Of:
Environment:
Last Closed:	2017-08-10 05:18:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The output of fluentd configmap (5.25 KB, text/plain) 2017-03-16 10:11 UTC, Xia Zhao	no flags	Details
inventory file used for logging deployment to let this bz repro (1006 bytes, text/plain) 2017-03-16 10:14 UTC, Xia Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1716	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.6 RPM Release Advisory	2017-08-10 09:02:50 UTC

Description Xia Zhao 2017-03-15 09:53:01 UTC

Description of problem:
The log throttling settings in fluentd didn't effect, get very similar log entries after narrowing it down from 100 to 1, no obvious effect visible from kibana UI when querying the hit numbers per 15, 10 and 5 minutes.

Version-Release number of selected component (if applicable):
https://github.com/openshift/openshift-ansible, -b master
head commit: 4c6d052e913c8a033c0198f4199b0f37f697bbe7

Images tested with ops registry:
openshift3/logging-kibana    a6159c640977
openshift3/logging-elasticsearch    686f330468c3
openshift3/logging-fluentd    32a4ac0a3e18
openshift3/logging-curator    8cfcb23f26b6
openshift3/logging-auth-proxy    139f7943475e


How reproducible:
Always

Steps to Reproduce:
1.Deploy logging 3.5.0 stacks using ansible on json-file log driver env

2.Create a user project which continously give out logging
$ oc new-project javaj
$ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT

3.Note down log arrival rate by querying log entries for javaj index per 15, 10 and 5 minutes on kibana-ops UI:
894, 589, 290

4. Edit fluendt configmap, set read_lines_limit to 1 for javaj index, restart fluentd daemonset to make it effect
$ oc edit configmap logging-fluentd -o yaml
$ oc label node $node logging-infra-fluentd=false --overwrite
$ oc label node $node logging-infra-fluentd=true --overwrite

5. Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist:

# oc rsh logging-fluentd-04jtg
sh-4.2# cd /etc/fluent/configs.d/user/
sh-4.2# cat throttle-config.yaml
# Logging example fluentd throttling config file

javaj:
  read_lines_limit: 1

#.operations:
#  read_lines_limit: 100

6. Log in kibana ops UI, check the log arrival rate per 15, 10 and 5 minutes minute after throttling:
896, 589, 291

Actual results:
Log throttling settings did not effect:

For index javaj:
before throttling : The log entries arrived per 15, 10 and 5 minutes are: 894, 589, 290
After throttling:  The log entries arrived per 15, 10 and 5 minutes are: 896, 589, 291

Expected results:
Log throttling settings should effect

Additional info:
fluentd log:
# oc logs -f logging-fluentd-04jtg
2017-03-15 05:04:20 -0400 [info]: reading config file path="/etc/fluent/fluent.conf"

The other logging components logs looked fine.

Comment 1 Jeff Cantrill 2017-03-15 14:29:18 UTC

Please post the configmap for fluentd and any additional configuration info from the pod.

Comment 2 Xia Zhao 2017-03-16 10:11:03 UTC

Created attachment 1263620 [details]
The output of fluentd configmap

Comment 3 Xia Zhao 2017-03-16 10:12:52 UTC

@Jeff, I uploaded the fluentd configmap after adding log throttling settings, I didn't have other additional settings for fluentd pod. Uploaded the inventory file, it only contain the most basic parameters.

Comment 4 Xia Zhao 2017-03-16 10:14:04 UTC

Created attachment 1263621 [details]
inventory file used for logging deployment to let this bz repro

Comment 5 Jeff Cantrill 2017-03-16 18:55:35 UTC

https://github.com/openshift/origin-aggregated-logging/pull/350

Moving target release to 3.6 unless we can get an image for 3.5.

Comment 6 Jeff Cantrill 2017-03-16 20:01:07 UTC

The work around for 3.5 would be:

1. Update the fluentd ds to have ENV[THROTTLE_CONF_LOCATION]=/etc/fluent/configs.d/user/
2. Modify the logging-fluentd configmap to change the 'throttle-config.yaml' key to be 'settings'

Comment 7 Jeff Cantrill 2017-03-16 21:27:08 UTC

Additional changes to get more debug logging: https://github.com/openshift/origin-aggregated-logging/pull/351

Comment 9 Xia Zhao 2017-04-12 08:15:50 UTC

Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1439451

Comment 10 Xia Zhao 2017-06-05 06:56:25 UTC

Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1458652

Comment 11 Xia Zhao 2017-06-05 06:58:20 UTC

Changed QA contact to juzhao since he's the owner of fluentd sub-component. @xtian

Comment 12 Xia Zhao 2017-07-05 07:03:38 UTC

Tested with the latest v3.6 images on OCP 3.6.0, the log throttling settings still seems no effect:

I monitored this user project which continously give out logging:
$ oc new-project javaj
$ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT

Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist. Here is the comparations:

For index javaj:
before throttling : The log entries arrived per 15, 10 and 5 minutes are: 891, 592, 291
After throttling:  The log entries arrived per 15, 10 and 5 minutes are: 889, 593, 293

My test env details are:

# openshift version
openshift v3.6.133
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

# rpm -qa | grep ansible
openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch
ansible-2.2.3.0-1.el7.noarch
openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch
openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch

Images tested with
logging-fluentd            v3.6                ba4e34d67dbe        24 hours ago        231.5 MB

Comment 14 Xia Zhao 2017-07-05 08:46:25 UTC

Please ignore comment #12, since it's actually done on journald log driver env. The test work on json-file env was blocked here: https://bugzilla.redhat.com/show_bug.cgi?id=1466152

Comment 15 Xia Zhao 2017-07-10 09:42:40 UTC

Tested with the latest v3.6 images on OCP 3.6.0, the log throttling settings still seems no effect:

I monitored this user project which continously give out logging:
$ oc new-project javaj
$ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT

Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist. Here is the comparations:

For index java:
before throttling : The log entries arrived per 15, 10 and 5 minutes are: 892, 593, 292
After throttling:  The log entries arrived per 15, 10 and 5 minutes are: 894, 590, 294

My test env details are:

# openshift version
openshift v3.6.126.14
kubernetes v1.6.1+5115d708d7
etcd 3.2.0


# rpm -qa | grep ansible
openshift-ansible-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-roles-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-docs-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-lookup-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-callback-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch
openshift-ansible-playbooks-3.6.139-1.git.0.4ff49c6.el7.noarch
ansible-2.2.3.0-1.el7.noarch
openshift-ansible-filter-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch


Images tested with
logging-fluentd         v3.6                a2ea005ef4f6        3 hours ago         231.5 MB

Comment 19 Jan Wozniak 2017-07-13 16:21:42 UTC

I will rephrase your original statement to see if I understand it well, please correct me if you find my assumptions wrong.

You monitor the number of logs arrived after 15, 10, and 5 minutes. Given the javaj application outputs 1 log per second, the precise count should be 900, 600, and 300 logs per respective interval. I think the counts you mentioned support this theory, small variance is acceptable.

`read_lines_limit` controls the number of lines read at each IO [1]. This variable should tune a size of the buffer used per IO rather than a fine control of throttling. I have a suspicion that there are far more IO operations per measured time interval that simply no difference should be visible regardless of what the value of `read_lines_limit` is.

A good test for throttling could be to modify javaj to have the log output non-uniform. Deterministically generate different number of logs per particular short intervals. Then setting `read_lines_limit` could have an effect on how the function of logs per second looks (setting it lower should reduce the spikes visible in the function, make it possibly even linear).



[1] http://docs.fluentd.org/v0.12/articles/in_tail#readlineslimit

Comment 20 Xia Zhao 2017-07-14 02:45:32 UTC

I've tried so -- the thing is, if I came across the period when less application logs were outputed , and considered this smaller number for the log counts before throttling settings; then came across the period when more application logs were outputed , and considered this larger number as for the log counts after throttling settings, the test result turned to be indeterminable. This is why I used a linear/uniform test application.

Now I understand that this may not also be applicable when it comes to the I/O  level data manipulations. Consider that the aggregated logging solution provided by openshift is only aimed at identifying log throttling settings for fluentd, I'm going to limit the test scope to only check log throttling settings are passed into fluentd container, instead of testing the native function of fluentd log throttling here. Please let me know if any further thinkings here.

In the meantime, please also feel free to transfer back to ON_QA for closure. Thanks!

Comment 21 Xia Zhao 2017-07-14 08:28:11 UTC

Verified on openshift v3.6.144 that the log throttling settings can be passed into fluentd container successfully by editing the configmap. Set to verified.

Image tested with:
logging-fluentd            v3.6                b9eeeec142af        17 hours ago        231.7 MB

ansible version:
openshift-ansible-playbooks-3.6.144-1.git.0.50e12bf.el7.noarch

Comment 23 errata-xmlrpc 2017-08-10 05:18:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.