Description of problem: The log throttling settings in fluentd didn't effect, get very similar log entries after narrowing it down from 100 to 1, no obvious effect visible from kibana UI when querying the hit numbers per 15, 10 and 5 minutes. Version-Release number of selected component (if applicable): https://github.com/openshift/openshift-ansible, -b master head commit: 4c6d052e913c8a033c0198f4199b0f37f697bbe7 Images tested with ops registry: openshift3/logging-kibana a6159c640977 openshift3/logging-elasticsearch 686f330468c3 openshift3/logging-fluentd 32a4ac0a3e18 openshift3/logging-curator 8cfcb23f26b6 openshift3/logging-auth-proxy 139f7943475e How reproducible: Always Steps to Reproduce: 1.Deploy logging 3.5.0 stacks using ansible on json-file log driver env 2.Create a user project which continously give out logging $ oc new-project javaj $ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT 3.Note down log arrival rate by querying log entries for javaj index per 15, 10 and 5 minutes on kibana-ops UI: 894, 589, 290 4. Edit fluendt configmap, set read_lines_limit to 1 for javaj index, restart fluentd daemonset to make it effect $ oc edit configmap logging-fluentd -o yaml $ oc label node $node logging-infra-fluentd=false --overwrite $ oc label node $node logging-infra-fluentd=true --overwrite 5. Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist: # oc rsh logging-fluentd-04jtg sh-4.2# cd /etc/fluent/configs.d/user/ sh-4.2# cat throttle-config.yaml # Logging example fluentd throttling config file javaj: read_lines_limit: 1 #.operations: # read_lines_limit: 100 6. Log in kibana ops UI, check the log arrival rate per 15, 10 and 5 minutes minute after throttling: 896, 589, 291 Actual results: Log throttling settings did not effect: For index javaj: before throttling : The log entries arrived per 15, 10 and 5 minutes are: 894, 589, 290 After throttling: The log entries arrived per 15, 10 and 5 minutes are: 896, 589, 291 Expected results: Log throttling settings should effect Additional info: fluentd log: # oc logs -f logging-fluentd-04jtg 2017-03-15 05:04:20 -0400 [info]: reading config file path="/etc/fluent/fluent.conf" The other logging components logs looked fine.
Please post the configmap for fluentd and any additional configuration info from the pod.
Created attachment 1263620 [details] The output of fluentd configmap
@Jeff, I uploaded the fluentd configmap after adding log throttling settings, I didn't have other additional settings for fluentd pod. Uploaded the inventory file, it only contain the most basic parameters.
Created attachment 1263621 [details] inventory file used for logging deployment to let this bz repro
https://github.com/openshift/origin-aggregated-logging/pull/350 Moving target release to 3.6 unless we can get an image for 3.5.
The work around for 3.5 would be: 1. Update the fluentd ds to have ENV[THROTTLE_CONF_LOCATION]=/etc/fluent/configs.d/user/ 2. Modify the logging-fluentd configmap to change the 'throttle-config.yaml' key to be 'settings'
Additional changes to get more debug logging: https://github.com/openshift/origin-aggregated-logging/pull/351
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1439451
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1458652
Changed QA contact to juzhao since he's the owner of fluentd sub-component. @xtian
Tested with the latest v3.6 images on OCP 3.6.0, the log throttling settings still seems no effect: I monitored this user project which continously give out logging: $ oc new-project javaj $ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist. Here is the comparations: For index javaj: before throttling : The log entries arrived per 15, 10 and 5 minutes are: 891, 592, 291 After throttling: The log entries arrived per 15, 10 and 5 minutes are: 889, 593, 293 My test env details are: # openshift version openshift v3.6.133 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 # rpm -qa | grep ansible openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch ansible-2.2.3.0-1.el7.noarch openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch Images tested with logging-fluentd v3.6 ba4e34d67dbe 24 hours ago 231.5 MB
Please ignore comment #12, since it's actually done on journald log driver env. The test work on json-file env was blocked here: https://bugzilla.redhat.com/show_bug.cgi?id=1466152
Tested with the latest v3.6 images on OCP 3.6.0, the log throttling settings still seems no effect: I monitored this user project which continously give out logging: $ oc new-project javaj $ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist. Here is the comparations: For index java: before throttling : The log entries arrived per 15, 10 and 5 minutes are: 892, 593, 292 After throttling: The log entries arrived per 15, 10 and 5 minutes are: 894, 590, 294 My test env details are: # openshift version openshift v3.6.126.14 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 # rpm -qa | grep ansible openshift-ansible-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-roles-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-docs-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-lookup-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-callback-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-playbooks-3.6.139-1.git.0.4ff49c6.el7.noarch ansible-2.2.3.0-1.el7.noarch openshift-ansible-filter-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch Images tested with logging-fluentd v3.6 a2ea005ef4f6 3 hours ago 231.5 MB
I will rephrase your original statement to see if I understand it well, please correct me if you find my assumptions wrong. You monitor the number of logs arrived after 15, 10, and 5 minutes. Given the javaj application outputs 1 log per second, the precise count should be 900, 600, and 300 logs per respective interval. I think the counts you mentioned support this theory, small variance is acceptable. `read_lines_limit` controls the number of lines read at each IO [1]. This variable should tune a size of the buffer used per IO rather than a fine control of throttling. I have a suspicion that there are far more IO operations per measured time interval that simply no difference should be visible regardless of what the value of `read_lines_limit` is. A good test for throttling could be to modify javaj to have the log output non-uniform. Deterministically generate different number of logs per particular short intervals. Then setting `read_lines_limit` could have an effect on how the function of logs per second looks (setting it lower should reduce the spikes visible in the function, make it possibly even linear). [1] http://docs.fluentd.org/v0.12/articles/in_tail#readlineslimit
I've tried so -- the thing is, if I came across the period when less application logs were outputed , and considered this smaller number for the log counts before throttling settings; then came across the period when more application logs were outputed , and considered this larger number as for the log counts after throttling settings, the test result turned to be indeterminable. This is why I used a linear/uniform test application. Now I understand that this may not also be applicable when it comes to the I/O level data manipulations. Consider that the aggregated logging solution provided by openshift is only aimed at identifying log throttling settings for fluentd, I'm going to limit the test scope to only check log throttling settings are passed into fluentd container, instead of testing the native function of fluentd log throttling here. Please let me know if any further thinkings here. In the meantime, please also feel free to transfer back to ON_QA for closure. Thanks!
Verified on openshift v3.6.144 that the log throttling settings can be passed into fluentd container successfully by editing the configmap. Set to verified. Image tested with: logging-fluentd v3.6 b9eeeec142af 17 hours ago 231.7 MB ansible version: openshift-ansible-playbooks-3.6.144-1.git.0.50e12bf.el7.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716