Created attachment 1322667 [details] logging-fluentd daemonset Description of problem: When JOURNAL_READ_FROM_HEAD is set to false in the logging-fluentd daemonset and the nodes are tagged to start logging-fluentd, the full contents of pre-existing pod logs are being indexed. The expectation is that only messages logged after logging-fluentd start will be indexed. We were seeing very high cpu utilization for logging-mux for a long period of time when logging-fluentd pods were started with a large number of pre-existing pods with large logs. It appeared that JOURNAL_READ_FROM_HEAD should default to false, but to be sure I set it explicitly (see attached daemonset). When the nodes were tagged with logging-infra-fluentd and the pods started I verified: - JOURNAL_READ_FROM_HEAD="false" in the pod ENV - no "read_from_head true" entry existed in /etc/fluent/configs.d/dynamic/input-syslog-default-syslog.conf (attached) Version-Release number of selected component (if applicable): All logging components are v3.6.173.0.27 How reproducible: Always Steps to Reproduce: 1. Stop logging-fluentd: oc label --all --overwrite logging-infra-fluentd=false 2. Start pods which perform a node amount of logging. I used ocp-logtest pods (https://github.com/openshift/svt/blob/master/openshift_scalability/content/logtest/ocp_logtest-README.md) to log 90K total lines across 10 pods. 3. Wait for the logging pods to be quiet - all messages completed (verify with oc logs for the pods) 4. oc edit ds logging-fluentd and set JOURNAL_READ_FROM_HEAD value to "false" 5. restart logging-fluentd: oc label --all --overwrite logging-infra-fluentd=true Actual results: 90K (in my case) messages indexed in elastic very high cpu on my logging-mux pod until it was able to ingest all messages Expected results: Only messages created after fluentd startup would be indexed Additional detail: From the documentation: https://docs.openshift.com/container-platform/3.6/install_config/aggregate_logging.html openshift_logging_fluentd_journal_read_from_head Set to true if Fluentd should read from the head of Journal when first starting up, using this may cause a delay in ES receiving current log records.
Created attachment 1322668 [details] syslog fluentd config
The default value is false: https://github.com/reevoo/fluent-plugin-systemd/blob/v0.0.8/lib/fluent/plugin/in_systemd.rb#L12 config_param :read_from_head, :bool, default: false due to an earlier bug in fluent-plugin-systemd, if you set JOURNAL_READ_FROM_HEAD=false, we just omit the setting `read_from_head` from the config, and let it use the default which is false: https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/generate_throttle_configs.rb#L136 # workaround for https://github.com/reevoo/fluent-plugin-systemd/issues/19 if ENV['JOURNAL_READ_FROM_HEAD'] == "true" file.write(<<-CONF2) read_from_head true CONF2 end which is why you don't see the setting. So, either this is a bug in fluent-plugin-systemd, or . . . Are you reconfiguring fluentd after the initial deployment? If so, then there may be an old /var/log/journal.pos file - if this file exists, fluentd will take this as the starting point to start reading from the journal, and will ignore the JOURNAL_READ_FROM_HEAD setting.
Originally I was reconfiguring after deployment. I deployed a new cluster with openshift_logging_fluentd_read_from_head=false in the inventory and ended up in the same state. sh-4.2# env | grep HEAD JOURNAL_READ_FROM_HEAD= sh-4.2# pwd /etc/fluent/configs.d/dynamic sh-4.2# cat input-syslog-default-syslog.conf <source> @type systemd @label @INGRESS path "/var/log/journal" pos_file /var/log/journal.pos tag journal </source>
To reiterate, this looks like it is behaving as described. If /var/log/journal.pos exists, then it just reads from that position in the journal.
It seems since the version v3.6.173.0.27, the fluentd configuration files are modified including the JOURNAL_READ_FROM_HEAD handling. +++ b/configs.d/openshift/input-pre-systemd.conf @@ -0,0 +1,9 @@ +<source> + @type systemd + @label @INGRESS + path "#{ENV['JOURNAL_SOURCE'] || '/run/log/journal'}" + pos_file "#{ENV['JOURNAL_POS_FILE'] || '/var/log/journal.pos'}" + filters "#{ENV['JOURNAL_FILTERS_JSON'] || '[]'}" + tag journal + read_from_head "#{ENV['JOURNAL_READ_FROM_HEAD'] || 'false'}" +</source> Please note that the latest version in the brew is: logging-fluentd-docker-v3.6.173.0.56-1 Mike, I guess you had a chance to upgrade your system since early September. Do you still have the problem? Thanks!
I do not see the problem on logging-fluentd 3.7.0-0.146.0. Not sure what that corresponds to for 3.6 as I have not tested there for a bit. Let me know if I should start up a 3.6 cluster and test there as well.
Thanks for your quick response, Mike!! I think the QE team is verifying bugs as part of the next 3.6.x release. Since this is a unit test, can we ask someone in the QE team assigned to the tasks to verify with the latest 3.6.x? Do you happen to know to whom we could ask? @jcantril, in such a case, can we change the status of this bug to ON_QA directly or should we set to MODIFIED once and add this bz to some errata and let the errata process change the status to ON_QA? Thanks!
I can take QE for this one on 3.6. I'll let Jeff answer how to get it into the errata flow. I think you either need to target this one for 3.6.z or create a clone targeted for 3.6.z, but better to let an expert answer.
(In reply to Mike Fiedler from comment #8) > I can take QE for this one on 3.6. Once again, thanks a lot, Mike!
@anli is this something you can verify is resolved?
Jeff, will test it and add comment
@richm, do you think this 'JOURNAL_READ_FROM_HEAD="false" not honored' issue had existed before your implementation to "Allow using both json-file and journald"? I'd rather think the problem appeared temporarily just with the image due to some mismatch, e.g., O_A and O_A_L. The reason I'm asking is the Doc... If this bug was not released, we should not doc it, I guess.
(In reply to Noriko Hosoi from comment #12) > @richm, do you think this 'JOURNAL_READ_FROM_HEAD="false" not honored' issue > had existed before your implementation to "Allow using both json-file and > journald"? Yes, for quite some time. > I'd rather think the problem appeared temporarily just with the > image due to some mismatch, e.g., O_A and O_A_L. The reason I'm asking is > the Doc... If this bug was not released, we should not doc it, I guess. I'm not sure if customers have run into this. I think we should document it, just in case some customer has run into this. It also may have been a bug in fluent-plugin-systemd or systemd-journal ruby library, which has been fixed because we are now using a newer version.
With logging-fluentd/v3.7.0-0.188.0.0. the test pass, so move bug to verified. When the JOURNAL_READ_FROM_HEAD is not set, The journal logs wasn't read from head. When set JOURNAL_READ_FROM_HEAD="false", The journal logs wasn't read from head
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188