Bug 1488941 - JOURNAL_READ_FROM_HEAD="false" not honored - pre-existing pod logs indexed on logging-fluentd start
Summary: JOURNAL_READ_FROM_HEAD="false" not honored - pre-existing pod logs indexed on...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Noriko Hosoi
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-06 13:39 UTC by Mike Fiedler
Modified: 2017-11-28 22:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In the configuration for the fluentd systemd input plugin, read_from_head parameter was not set properly based on the environment variable JOURNAL_READ_FROM_HEAD or its corresponding ansible parameter openshift_logging_fluentd_journal_read_from_head. Due to the problem, the full contents of pre-existing logs were indexed instead of the latest logs captured by "tail" when a pos_file does not exist, which happens when the logging system is initially deployed or a pos_file is deleted. With this bug fix, the parameter is correctly set. And based on the setting, if JOURNAL_READ_FROM_HEAD=true, all the logs are indexed; if JOURNAL_READ_FROM_HEAD=false, logs read from "tail" are indexed when a pos_file does not exist.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:09:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logging-fluentd daemonset (4.98 KB, text/plain)
2017-09-06 13:39 UTC, Mike Fiedler
no flags Details
syslog fluentd config (125 bytes, text/plain)
2017-09-06 13:39 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Mike Fiedler 2017-09-06 13:39:05 UTC
Created attachment 1322667 [details]
logging-fluentd daemonset

Description of problem:


When JOURNAL_READ_FROM_HEAD is set to false in the logging-fluentd daemonset and the nodes are tagged to start logging-fluentd,  the full contents of pre-existing pod logs are being indexed.  

The expectation is that only messages logged after logging-fluentd start will be indexed.

We were seeing very high cpu utilization for logging-mux for a long period of time when logging-fluentd pods were started with a large number of pre-existing pods with large logs.    It appeared that JOURNAL_READ_FROM_HEAD should default to false, but to be sure I set it explicitly (see attached daemonset).   When the nodes were tagged with logging-infra-fluentd and the pods started I verified:

 - JOURNAL_READ_FROM_HEAD="false"  in the pod ENV
 - no "read_from_head true" entry existed in /etc/fluent/configs.d/dynamic/input-syslog-default-syslog.conf (attached)

Version-Release number of selected component (if applicable):  All logging components are v3.6.173.0.27


How reproducible: Always


Steps to Reproduce:
1.  Stop logging-fluentd:   oc label --all --overwrite logging-infra-fluentd=false
2.  Start pods which perform a node amount of logging.   I used ocp-logtest pods (https://github.com/openshift/svt/blob/master/openshift_scalability/content/logtest/ocp_logtest-README.md) to log 90K total lines across 10 pods.
3.  Wait for the logging pods to be quiet - all messages completed (verify with oc logs for the pods)
4.  oc edit ds logging-fluentd and set JOURNAL_READ_FROM_HEAD value to "false"
5. restart logging-fluentd:  oc label --all --overwrite logging-infra-fluentd=true

Actual results:

90K (in my case) messages indexed in elastic
very high cpu on my logging-mux pod until it was able to ingest all messages

Expected results:

Only messages created after fluentd startup would be indexed

Additional detail:

From the documentation:

https://docs.openshift.com/container-platform/3.6/install_config/aggregate_logging.html

openshift_logging_fluentd_journal_read_from_head

Set to true if Fluentd should read from the head of Journal when first starting up, using this may cause a delay in ES receiving current log records.

Comment 1 Mike Fiedler 2017-09-06 13:39:31 UTC
Created attachment 1322668 [details]
syslog fluentd config

Comment 2 Rich Megginson 2017-09-06 15:04:38 UTC
The default value is false: https://github.com/reevoo/fluent-plugin-systemd/blob/v0.0.8/lib/fluent/plugin/in_systemd.rb#L12

 config_param :read_from_head, :bool, default: false

due to an earlier bug in fluent-plugin-systemd, if you set JOURNAL_READ_FROM_HEAD=false, we just omit the setting `read_from_head` from the config, and let it use the default which is false:  https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/generate_throttle_configs.rb#L136

      # workaround for https://github.com/reevoo/fluent-plugin-systemd/issues/19
      if ENV['JOURNAL_READ_FROM_HEAD'] == "true"
        file.write(<<-CONF2)
  read_from_head true
      CONF2
      end

which is why you don't see the setting.

So, either this is a bug in fluent-plugin-systemd, or . . .

Are you reconfiguring fluentd after the initial deployment?  If so, then there may be an old /var/log/journal.pos file - if this file exists, fluentd will take this as the starting point to start reading from the journal, and will ignore the JOURNAL_READ_FROM_HEAD setting.

Comment 3 Mike Fiedler 2017-09-07 14:47:11 UTC
Originally I was reconfiguring after deployment.  

I deployed a new cluster with 

openshift_logging_fluentd_read_from_head=false

in the inventory and ended up in the same state.  

sh-4.2# env | grep HEAD
JOURNAL_READ_FROM_HEAD=

sh-4.2# pwd
/etc/fluent/configs.d/dynamic

sh-4.2# cat input-syslog-default-syslog.conf 
<source>
  @type systemd
  @label @INGRESS
  path "/var/log/journal"
  pos_file /var/log/journal.pos
  tag journal
</source>

Comment 4 Peter Portante 2017-09-09 19:44:06 UTC
To reiterate, this looks like it is behaving as described.  If /var/log/journal.pos exists, then it just reads from that position in the journal.

Comment 5 Noriko Hosoi 2017-10-18 17:33:25 UTC
It seems since the version v3.6.173.0.27, the fluentd configuration files are modified including the JOURNAL_READ_FROM_HEAD handling.

+++ b/configs.d/openshift/input-pre-systemd.conf
@@ -0,0 +1,9 @@
+<source>
+  @type systemd
+  @label @INGRESS
+  path "#{ENV['JOURNAL_SOURCE'] || '/run/log/journal'}"
+  pos_file "#{ENV['JOURNAL_POS_FILE'] || '/var/log/journal.pos'}"
+  filters "#{ENV['JOURNAL_FILTERS_JSON'] || '[]'}"
+  tag journal
+  read_from_head "#{ENV['JOURNAL_READ_FROM_HEAD'] || 'false'}"
+</source>

Please note that the latest version in the brew is:
logging-fluentd-docker-v3.6.173.0.56-1

Mike, I guess you had a chance to upgrade your system since early September.  Do you still have the problem?  Thanks!

Comment 6 Mike Fiedler 2017-10-18 18:25:12 UTC
I do not see the problem on logging-fluentd 3.7.0-0.146.0.   Not sure what that corresponds to for 3.6 as I have not tested there for a bit.   Let me know if I should start up a 3.6 cluster and test there as well.

Comment 7 Noriko Hosoi 2017-10-18 19:08:54 UTC
Thanks for your quick response, Mike!!

I think the QE team is verifying bugs as part of the next 3.6.x release.  Since this is a unit test, can we ask someone in the QE team assigned to the tasks to verify with the latest 3.6.x?  Do you happen to know to whom we could ask?

@jcantril, in such a case, can we change the status of this bug to ON_QA directly or should we set to MODIFIED once and add this bz to some errata and let the errata process change the status to ON_QA?  Thanks!

Comment 8 Mike Fiedler 2017-10-18 19:52:41 UTC
I can take QE for this one on 3.6.   I'll let Jeff answer how to get it into the errata flow.   I think you either need to target this one for 3.6.z or create a clone targeted for 3.6.z, but better to let an expert answer.

Comment 9 Noriko Hosoi 2017-10-18 23:05:57 UTC
(In reply to Mike Fiedler from comment #8)
> I can take QE for this one on 3.6.  
Once again, thanks a lot, Mike!

Comment 10 Jeff Cantrill 2017-10-25 15:07:37 UTC
@anli is this something you can verify is resolved?

Comment 11 Anping Li 2017-10-26 02:36:50 UTC
Jeff, will test it and add comment

Comment 12 Noriko Hosoi 2017-10-31 20:36:51 UTC
@richm, do you think this 'JOURNAL_READ_FROM_HEAD="false" not honored' issue had existed before your implementation to "Allow using both json-file and journald"?  I'd rather think the problem appeared temporarily just with the image due to some mismatch, e.g., O_A and O_A_L.  The reason I'm asking is the Doc...  If this bug was not released, we should not doc it, I guess.

Comment 13 Rich Megginson 2017-10-31 20:51:38 UTC
(In reply to Noriko Hosoi from comment #12)
> @richm, do you think this 'JOURNAL_READ_FROM_HEAD="false" not honored' issue
> had existed before your implementation to "Allow using both json-file and
> journald"?

Yes, for quite some time.

> I'd rather think the problem appeared temporarily just with the
> image due to some mismatch, e.g., O_A and O_A_L.  The reason I'm asking is
> the Doc...  If this bug was not released, we should not doc it, I guess.

I'm not sure if customers have run into this.  I think we should document it, just in case some customer has run into this.

It also may have been a bug in fluent-plugin-systemd or systemd-journal ruby library, which has been fixed because we are now using a newer version.

Comment 14 Anping Li 2017-11-02 07:36:43 UTC
With logging-fluentd/v3.7.0-0.188.0.0. the test pass, so move bug to verified.
When the JOURNAL_READ_FROM_HEAD is not set, The journal logs wasn't read from head. 
When set JOURNAL_READ_FROM_HEAD="false", The journal logs wasn't read from head

Comment 18 errata-xmlrpc 2017-11-28 22:09:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.