Bug 1432389
| Summary: | The log throttling settings didn't effect | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||
| Component: | Logging | Assignee: | Jan Wozniak <jwozniak> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Xia Zhao <xiazhao> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.5.0 | CC: | aos-bugs, rmeggins, tdawson, xiazhao, xtian | ||||||
| Target Milestone: | --- | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Cause: The script to evaluate the log settings was ignoring the throttling file.
Consequence: No throttling was configured
Fix: Modify the throttling script to process the throttling file
Result: Users configuring docker with the json-file log driver should be able to throttle projects
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-08-10 05:18:47 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Xia Zhao
2017-03-15 09:53:01 UTC
Please post the configmap for fluentd and any additional configuration info from the pod. Created attachment 1263620 [details]
The output of fluentd configmap
@Jeff, I uploaded the fluentd configmap after adding log throttling settings, I didn't have other additional settings for fluentd pod. Uploaded the inventory file, it only contain the most basic parameters. Created attachment 1263621 [details]
inventory file used for logging deployment to let this bz repro
https://github.com/openshift/origin-aggregated-logging/pull/350 Moving target release to 3.6 unless we can get an image for 3.5. The work around for 3.5 would be: 1. Update the fluentd ds to have ENV[THROTTLE_CONF_LOCATION]=/etc/fluent/configs.d/user/ 2. Modify the logging-fluentd configmap to change the 'throttle-config.yaml' key to be 'settings' Additional changes to get more debug logging: https://github.com/openshift/origin-aggregated-logging/pull/351 Changed QA contact to juzhao since he's the owner of fluentd sub-component. @xtian Tested with the latest v3.6 images on OCP 3.6.0, the log throttling settings still seems no effect: I monitored this user project which continously give out logging: $ oc new-project javaj $ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist. Here is the comparations: For index javaj: before throttling : The log entries arrived per 15, 10 and 5 minutes are: 891, 592, 291 After throttling: The log entries arrived per 15, 10 and 5 minutes are: 889, 593, 293 My test env details are: # openshift version openshift v3.6.133 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 # rpm -qa | grep ansible openshift-ansible-callback-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-docs-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-lookup-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-filter-plugins-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-playbooks-3.6.133-1.git.0.950bb48.el7.noarch ansible-2.2.3.0-1.el7.noarch openshift-ansible-3.6.133-1.git.0.950bb48.el7.noarch openshift-ansible-roles-3.6.133-1.git.0.950bb48.el7.noarch Images tested with logging-fluentd v3.6 ba4e34d67dbe 24 hours ago 231.5 MB Please ignore comment #12, since it's actually done on journald log driver env. The test work on json-file env was blocked here: https://bugzilla.redhat.com/show_bug.cgi?id=1466152 Tested with the latest v3.6 images on OCP 3.6.0, the log throttling settings still seems no effect: I monitored this user project which continously give out logging: $ oc new-project javaj $ oc new-app chunyunchen/java-mainclass:2.3-SNAPSHOT Log in fluentd pod, visit /etc/fluent/configs.d/user/throttle-config.yaml and make sure the throtling settings exist. Here is the comparations: For index java: before throttling : The log entries arrived per 15, 10 and 5 minutes are: 892, 593, 292 After throttling: The log entries arrived per 15, 10 and 5 minutes are: 894, 590, 294 My test env details are: # openshift version openshift v3.6.126.14 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 # rpm -qa | grep ansible openshift-ansible-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-roles-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-docs-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-lookup-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-callback-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch openshift-ansible-playbooks-3.6.139-1.git.0.4ff49c6.el7.noarch ansible-2.2.3.0-1.el7.noarch openshift-ansible-filter-plugins-3.6.139-1.git.0.4ff49c6.el7.noarch Images tested with logging-fluentd v3.6 a2ea005ef4f6 3 hours ago 231.5 MB I will rephrase your original statement to see if I understand it well, please correct me if you find my assumptions wrong. You monitor the number of logs arrived after 15, 10, and 5 minutes. Given the javaj application outputs 1 log per second, the precise count should be 900, 600, and 300 logs per respective interval. I think the counts you mentioned support this theory, small variance is acceptable. `read_lines_limit` controls the number of lines read at each IO [1]. This variable should tune a size of the buffer used per IO rather than a fine control of throttling. I have a suspicion that there are far more IO operations per measured time interval that simply no difference should be visible regardless of what the value of `read_lines_limit` is. A good test for throttling could be to modify javaj to have the log output non-uniform. Deterministically generate different number of logs per particular short intervals. Then setting `read_lines_limit` could have an effect on how the function of logs per second looks (setting it lower should reduce the spikes visible in the function, make it possibly even linear). [1] http://docs.fluentd.org/v0.12/articles/in_tail#readlineslimit I've tried so -- the thing is, if I came across the period when less application logs were outputed , and considered this smaller number for the log counts before throttling settings; then came across the period when more application logs were outputed , and considered this larger number as for the log counts after throttling settings, the test result turned to be indeterminable. This is why I used a linear/uniform test application. Now I understand that this may not also be applicable when it comes to the I/O level data manipulations. Consider that the aggregated logging solution provided by openshift is only aimed at identifying log throttling settings for fluentd, I'm going to limit the test scope to only check log throttling settings are passed into fluentd container, instead of testing the native function of fluentd log throttling here. Please let me know if any further thinkings here. In the meantime, please also feel free to transfer back to ON_QA for closure. Thanks! Verified on openshift v3.6.144 that the log throttling settings can be passed into fluentd container successfully by editing the configmap. Set to verified. Image tested with: logging-fluentd v3.6 b9eeeec142af 17 hours ago 231.7 MB ansible version: openshift-ansible-playbooks-3.6.144-1.git.0.50e12bf.el7.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |