Bug 1513254 - logging-fluentd performance regression between 3.7.0-0.158.0 and 3.7.7 + message loss.
Summary: logging-fluentd performance regression between 3.7.0-0.158.0 and 3.7.7 + mess...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.7.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.7.z
Assignee: Rich Megginson
QA Contact: Mike Fiedler
URL:
Whiteboard: aos-scalability-37
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-15 02:42 UTC by Mike Fiedler
Modified: 2018-10-08 12:09 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Every field in every record is parsed and converted to utf-8 format. Consequence: There is at least a 20% performance hit, or more, depending on data size, number of records, etc. Fix: Disable the utf-8 filtering by default. Result: There is a noticeable performance improvement. If you need to make sure all records/fields are properly converted to utf-8, and you can tolerate the performance hit, you can enable it like this: oc set env ds/logging-fluentd ENABLE_UTF8_FILTER=true
Clone Of:
Environment:
Last Closed: 2018-10-08 12:09:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ES and fluentd logs from anomalous runs (1.81 KB, application/x-gzip)
2017-11-15 02:42 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 787 0 None closed Bug 1513254 - logging-fluentd performance regression between 3.7.0-0.158.0 and 3.7.7 + message loss. 2020-12-29 07:52:14 UTC

Description Mike Fiedler 2017-11-15 02:42:22 UTC
Created attachment 1352321 [details]
ES and fluentd logs from anomalous runs

Description of problem:

Somewhere between logging-fluentd 3.7.0-0.158.0 (OCP 3.7 scale lab version) and 3.7.7 (SVT regression test version) there has been a regression in the performance of logging-fluentd in a non-mux configuration.   I have not tried to bisect it down to the exact build.

fluentd CPU % numbers below are out of 1 core.

3.7.0-0.158.0:

250 1K messages/second per fluentd  fluentd CPU avg = 24%   Message loss = 0%
500 1K messages/second per fluentd  fluentd CPU avg = 46%   Message loss = 3%
750 1K messages/second per fluentd  fluentd CPU avg = 68%   Message loss = 5%

No errors in logging-fluentd pod logs.  

3.7.7:

250 1K messages/second per fluentd  fluentd CPU avg = 38%   Message loss = 4%
500 1K messages/second per fluentd  fluentd CPU avg = 73%   Message loss = 4%
750 1K messages/second per fluentd  fluentd CPU avg = 88%   Message loss = 13%


No error messages in the logging-fluentd log and no retry messages.  ES and fluentd logs will be attached for the runs with higher CPU/higher message loss.

Repeated this twice on each build with very consistent results.

The message loss is possibly related to re-opened bz https://bugzilla.redhat.com/show_bug.cgi?id=1486473 which also had message loss.

Version-Release number of selected component (if applicable): logging-fluentd 3.7.7


How reproducible: Always


Steps to Reproduce:
1.  Deploy logging-fluentd 3.7.0-0.158.0
2.  Run a message rate of 500 messages per second through the logging-fluentd node.  Record fluentd cpu usage with pidstat -C fluentd 30.   Average the samples
3.  Edit the logging-fluentd daemonset and update the logging-fluentd image to 3.7.7 and re-start the fluentd pods
4.  Re-run the test

Actual results:

29% - 58% higher cpu for the 3.7.7 build + higher message loss.   The message loss itself seems to be a regression from even earlier builds as experiments were done in 3.6.z and early 3.7 where there was no message loss at single fluentd rates in excess of 750 1K messages per second per fluentd.

Expected results:

No message loss.   CPU utilization which allows 750 - 1000 1K messages per second per fluentd.  No performance regression

Additional info:

OOTB memory and cpu requests/limits in effect.

[OSEv3:children]
masters
etcd

[masters]
ec2-54-190-57-184.us-west-2.compute.amazonaws.com

[etcd]
ec2-54-190-57-184.us-west-2.compute.amazonaws.com

[OSEv3:vars]
openshift_deployment_type=openshift-enterprise
openshift_release=v3.7.0

openshift_logging_install_logging=true
openshift_logging_master_url=https://ec2-54-190-57-184.us-west-2.compute.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-54-190-57-184.us-west-2.compute.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.1107-6c3.qe.rhcloud.com
openshift_logging_namespace=logging
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_logging_image_version=v3.7.0-0.158.0
openshift_logging_es_cluster_size=1
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_size=50Gi
openshift_logging_fluentd_use_journal=false
openshift_logging_use_mux=false
openshift_logging_use_ops=false

Comment 1 Rich Megginson 2017-11-15 03:26:15 UTC
What do these say?

oc exec -c elasticsearch $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v

oc exec -c elasticsearch $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/thread_pool?v

Comment 2 Mike Fiedler 2017-11-15 03:41:07 UTC
root@ip-172-31-52-115: ~ # oc exec -c elasticsearch $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v
health status index                                                           pri rep docs.count docs.deleted store.size pri.store.size 
green  open   project.logging.258aabdb-c87a-11e7-9459-0210c4607772.2017.11.15   1   0       2388            0        1mb            1mb 
green  open   .operations.2017.11.15                                            1   0      17027            0     10.9mb         10.9mb 
green  open   .searchguard.logging-es-data-master-t1b082or                      1   0          5            0     33.5kb         33.5kb 
green  open   project.mff0.24aa2e93-c9a6-11e7-9459-0210c4607772.2017.11.15      1   0     595099            0    240.5mb        240.5mb 
green  open   .kibana                                                           1   0          1            0      3.1kb          3.1kb 
green  open   .operations.2017.11.14                                            1   0      59500            0     25.3mb         25.3mb 


project.mff0 should have 675000 documents

root@ip-172-31-52-115: ~ # oc exec -c elasticsearch $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/thread_pool?v
host        ip          bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
172.23.0.46 172.23.0.46           0          0             0            0           0              0             0            0               0 


bulk.rejected never has a value and bulk.active/bulk.queue never go above 1 during periodic sampling.

Comment 3 Rich Megginson 2017-11-15 03:54:23 UTC
There are three main changes:

1) docker audit logs

This should only be enabled if you set env AUDIT_CONTAINER_ENGINE=true

oc exec $fluentd_pod -- ls -alrtF /etc/fluent/configs.d/openshift

If you have files input-pre-audit-log.conf and filter-pre-a-audit-exclude.conf that are not 0 length, then that is a bug.

If this is disabled, I would not expect much cpu or message loss.

Comment 4 Rich Megginson 2017-11-15 03:55:29 UTC
2) configs.d/openshift/filter-pre-force-utf8.conf

+<filter **>
+  @type record_modifier
+  char_encoding utf-8
+</filter>

This affects every field/record.  This may contribute to cpu but I would not expect record loss

Comment 5 Rich Megginson 2017-11-15 03:57:48 UTC
3) the k8s meta cache and watches tuning

 <filter kubernetes.**>
   type kubernetes_metadata
   kubernetes_url "#{ENV['K8S_HOST_URL']}"
+  cache_size "#{ENV['K8S_METADATA_CACHE_SIZE'] || '1000'}"
+  watch "#{ENV['K8S_METADATA_WATCH'] || 'false'}"

If there are a lot of pods/namespaces used during the test, this may have a significant cpu impact and may lead to message "loss" (but you should see the message about "missing kubernetes field - cannot construct index name, and the records should go into the fluentd index or possibly be rerouted to some other index).

Comment 6 Rich Megginson 2017-11-15 04:00:12 UTC
Are you sure that those missing records didn't end up in the .operations.* indices?  If not, and there are no fluentd errors, I'm at a loss to explain what happened to the missing records.

Comment 7 Rich Megginson 2017-11-15 04:02:20 UTC
rhpkg co logging-fluentd-docker ; cd logging-fluentd-docker

git diff ba68308a0e45eac247e7b70be6b7c9673c431b80..aa03149eab5388827ee82b7c757fc43a9edbab98

Comment 8 Rich Megginson 2017-11-15 04:07:15 UTC
Perhaps the records were buffered and not sent to elasticsearch?

ls -alrtF /var/lib/fluentd

to list the buffers

There is a tool called msgpack-inspect which will dump those buffers: 

gem install msgpack-inspect

msgpack-inspect /var/lib/fluentd/some-file.log

Comment 9 Rich Megginson 2017-11-15 04:08:10 UTC
(In reply to Rich Megginson from comment #8)
> There is a tool called msgpack-inspect which will dump those buffers: 
> 
> gem install msgpack-inspect
> 
> msgpack-inspect /var/lib/fluentd/some-file.log

probably easier to do this inside the fluentd pod if still running

Comment 10 Walid A. 2017-11-15 19:08:40 UTC
In response to comment 6 and comment 8 :

Test was repeated, 675,000 messages were logged from the logtest pod, only 612,872 documents made it into elastic search for the project index "project.walid-0".  62,128 documents are missing.  There were no errors in fluentd or elasticsearch pod logs.

The .operations.<today's date> index was deleted before the test.  After the test, there were only 4291 documents in .operations.2017.11.15.

root@ip-172-31-52-115: # oc exec -n logging $espod -c elasticsearch -- curl --connect-timeout 2 -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://logging-es:9200/_cat/indices?v 
health status index                                                           pri rep docs.count docs.deleted store.size pri.store.size 
green  open   project.logging.258aabdb-c87a-11e7-9459-0210c4607772.2017.11.15   1   0      12628            0      3.6mb          3.6mb 
green  open   .operations.2017.11.15                                            1   0       4291            0      2.1mb          2.1mb 
green  open   .searchguard.logging-es-data-master-t1b082or                      1   0          5            0     33.5kb         33.5kb 
green  open   .kibana                                                           1   0          1            0      3.1kb          3.1kb 
green  open   .operations.2017.11.14                                            1   0      59500            0     25.3mb         25.3mb 
green  open   project.walid-0.fd037d94-ca28-11e7-9459-0210c4607772.2017.11.15   1   0     612872            0    248.1mb        248.1mb 
green  open   project.mff0.24aa2e93-c9a6-11e7-9459-0210c4607772.2017.11.15      1   0     595099            0    240.5mb        240.5mb 
root@ip-172-31-52-115: ~/svt/openshift_scalability #


In the fluentd pod, in /var/lib/fluentd, there were two transient buffer files every sec or during the logging test.  No buffer files were left in that directory after the test completed. 

Sample transient files:
total 3892
drwxr-xr-x. 1 root root      35 Nov 15 01:39 ../
-rw-r--r--. 1 root root 1952309 Nov 15 17:33 buffer-output-es-config.output_tag.q55e08e7c50b99100.log
drwxr-xr-x. 2 root root     134 Nov 15 17:33 ./
-rw-r--r--. 1 root root 1954255 Nov 15 17:33 buffer-output-es-config.output_tag.b55e08e7da298a677.log


bulk.rejected was 0:
oc exec -c elasticsearch $espod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/thread_pool?v
host        ip          bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
172.23.0.46 172.23.0.46

Comment 11 Rich Megginson 2017-11-16 20:32:50 UTC
For the performance, the primary culprit seems to be the utf8 record modifier.  With it in the pipeline, the performance is twice as slow.

Comment 12 Rich Megginson 2017-11-16 22:44:50 UTC
This is just for the performance part, and may not address all of the performance issues.  This does not address the record loss issue.

Comment 20 openshift-github-bot 2017-11-30 20:35:32 UTC
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/6b74f2f9c16dec7ee90681ee57234f696d2b553a
Bug 1513254 - logging-fluentd performance regression between 3.7.0-0.158.0 and 3.7.7 + message loss.

https://bugzilla.redhat.com/show_bug.cgi?id=1513254
The utf8 filter causes the peformance to be cut in half.  Disable
it by default.  If you need to use it, and can pay the performance
penalty,

    oc set env ds/logging-fluentd ENABLE_UTF8_FILTER=true

https://github.com/openshift/origin-aggregated-logging/commit/9f87406267ce7e7b2eca6b00d90c9996dc65267c
Merge pull request #787 from richm/bug-1513254

Automatic merge from submit-queue.

Bug 1513254 - logging-fluentd performance regression between 3.7.0-0.158.0 and 3.7.7 + message loss.

https://bugzilla.redhat.com/show_bug.cgi?id=1513254
The utf8 filter causes the peformance to be cut in half.  Disable
it by default.  If you need to use it, and can pay the performance
penalty,

    oc set env ds/logging-fluentd ENABLE_UTF8_FILTER=true

see also https://github.com/openshift/origin-aggregated-logging/pull/622

/test

Comment 22 Mike Fiedler 2017-12-13 21:20:10 UTC
The latest v3.7 logging-fluentd image does not contain the fix.   I checked run.sh and the code in https://github.com/openshift/origin-aggregated-logging/pull/806/files  is not present.

Moving this defect back to MODIFIED until a build with the fix is available. We're testing with 3.7.14-2:

registry.reg-aws.openshift.com:443/openshift3/logging-fluentd             v3.7.14-2           772622353a46        5 days ago          235.5 MB


5 days ago is as of 13 December.  Since PR #806 merged on 30 November, maybe it is a build issue?

Comment 27 Walid A. 2018-01-05 07:12:29 UTC
Retested with logging-fluentd:v3.7.16-1, with same out-of-the box memory and cpu requests/limits for logging-fluentd and ES.  Used the same docker json-file log driver configuration of max-file 3 and max-size 10 MB. The utf8-filter should be disabled by default.  I also confirmed the run.sh script in fluentd pods contains the code update from PR https://github.com/openshift/origin-aggregated-logging/pull/787/files, and oc exec $fluentd_pod -- ls -alrtF /etc/fluent/configs.d/openshift showing a file size of 0 for filter-pre-force-utf8.conf.

When the same logging tests at 250 and 500 messages per second were repeated, message loss in Elasticsearch was still observed and at the same percentage.  No improvement in ES message loss compared to the previously reported results with the utf8-filter enabled.  No errors in fluentd or ES pod logs were observed.  ES bulk.rejected was always at 0, and bulk.active/bulk.queue never went above 1 during the test.  The only improvement observed was in fluentd CPU utilization on the compute nodes:

- 250 1 KB messages per second test:  666451 documents were indexed for pod project in ES out of the 675000 messages logged.  8549 messages were lost (1.3%).  fluentd CPU avg = 25%.  Compared with previous tests with utf8-filter enabled, we had 8891 messages lost (1.4%) in ES and fluentd CPU avg at 34% of 1 core.

- 500 1 KB messages per second test:  657909 documents were indexed for pod project in ES out of the 675000 messages logged.  17091 messages were lost (2.5%).  fluentd CPU avg = 49%.  Compared with previous tests with utf8-filter enabled, we had 17774 messages lost (2.6%) in ES and fluentd CPU avg at 76% of 1 core.

Another interesting observation was when the tests were repeated, exactly the same number of messages lost was observed on the subsequent runs.

Finally, when I increased the fluentd ds config variables BUFFER_QUEUE_LIMIT from 32 to 1024, BUFFER_SIZE_LIMIT from 8m to 16m, and FILE_BUFFER_LIMIT from 256Mi to 1Gi, I observed NO message lost on ES and all 6750000 documents were indexed at 500 messages per second logging rate with utf8-filter disabled.  This suggests the message loss in ES is mainly related to fluentd buffering capacity/performance.

Moving back to Assigned state.

Comment 28 Rich Megginson 2018-01-05 16:06:27 UTC
So we still have a regression from the baseline 3.7 release.  The fix in https://bugzilla.redhat.com/show_bug.cgi?id=1513254#c20 wasn't meant to fix all of the problems reported in this bz, but it appears to have helped.

We have "fixed" the utf8 filter https://bugzilla.redhat.com/show_bug.cgi?id=1513254#c4

I'm pretty sure it isn't related to the audit log: https://bugzilla.redhat.com/show_bug.cgi?id=1513254#c3

That leaves the following:

- k8s filter changes: https://bugzilla.redhat.com/show_bug.cgi?id=1513254#c5

- something else - not sure what that would be - perhaps changes in fluentd itself, some dependency, etc.

Comment 29 Walid A. 2018-01-10 04:51:47 UTC
we are now consistently seeing NO message loss in Elasticsearch at 500 messages per sec and even 750 messages per second logging rates with the following settings:

- utf8-filter disbaled (default)
- out-of-the-box logging settings
- docker on compute nodes configured with "--log-opt max-size from 100M (instead of 10M) for the json-file, and using the same value of max-file=3 

This was confirmed on two separate OCP 3.7 clusters (openshift version 3.7.14-1 and 3.7.18-1) and running logging images v3.7.16-1.

In each test, 1500000 1K messages were logged from a logtest pod at rates 750 messages per second.  No loss was observed.  fluentd CPU Utilization averaged around 72% when logging at 750 messages per sec.

It is clear that the size of the json-file logs files has a significant impact on ES message loss when pods are logging high log rates of 300 1K messages per sec or higher.   This was also mentioned in private comment comment 16.

Comment 31 Rich Megginson 2018-05-11 14:49:08 UTC
@Mike @Walid - what do we need to do to close this bug?

Comment 32 Mike Fiedler 2018-05-11 15:10:40 UTC
I think we should do one more test pass on the latest 3.7 errata and document the current performance.  QE SVT will queue that up.  You can move the bz to ON_QA if there are no further planned code changes targeted for this bz.

Comment 36 Mike Fiedler 2018-06-04 14:36:45 UTC
Disabling the utf8 filter (by default) addressed this issue.

Verified on 3.7.44


Note You need to log in before you can comment on or make changes to this bug.