Bug 1478613 - Fluentd appears unable or unwilling to index certain logs into Elasticsearch
Fluentd appears unable or unwilling to index certain logs into Elasticsearch
Status: CLOSED INSUFFICIENT_DATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.5.0
All All
unspecified Severity high
: ---
: 3.9.0
Assigned To: Noriko Hosoi
Anping Li
: NeedsTestCase
: 1458418 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-04 23:43 EDT by Peter Portante
Modified: 2018-02-19 13:56 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-01-19 11:20:34 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Missing log entries (551 bytes, text/plain)
2017-09-05 09:35 EDT, jchevret
no flags Details

  None (edit)
Description Peter Portante 2017-08-04 23:43:53 EDT
It appears that some part of the log collection pipeline is unable or unwilling to index certain log lines in a file.

The data contains sensitive information, so it won't be posted publicly in this description.

The OpenShift Dedicated team is managing a cluster recently upgraded to 3.5 exhibiting this behavior.  We have checked the operational logs of all fluentd pods and Elasticsearch without seeing any errors.
Comment 1 Jeff Cantrill 2017-08-24 10:25:17 EDT
Can you provide more information for us to investigate?  Can you possibly add them as a private comment or ref an attachment and redact the sensitive info?
Comment 2 jchevret 2017-09-05 09:35 EDT
Created attachment 1322238 [details]
Missing log entries

Attaching two log entries which can be found in the OpenShift console under the pod's Log tab. However the same two entries cannot be found in kibana (search term "PHR91lbiNh-116033")

Other log entries with the same JSON format & fields can be found so it's likely not an issue with the JSON object or format per se.
Comment 3 Rich Megginson 2017-09-05 09:48:36 EDT
Are there any errors in the fluentd or elasticsearch pod logs related to these messages?
Comment 4 Rich Megginson 2017-09-05 11:04:03 EDT
I think one of the problems is that the final version of the record has no "message" or "log" field:

https://github.com/openshift/origin-aggregated-logging/blob/release-1.5/fluentd/configs.d/openshift/filter-k8s-record-transform.conf#L28

I'm assuming these messages are coming in from a pod/container log.  I'm also assuming that the attached messages are the unescaped JSON value from the "log" field from the actual container json-file, or the MESSAGE field for the journal record.

In this case - the record will come in with a "log" or "MESSAGE" field, and will be parsed here: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L335

This will parse the embedded JSON string value, set all of those fields to be top level fields in the record, and remove the "log" or "MESSAGE" field from the record.

The record ends up with no message field.
Comment 5 jchevret 2017-09-05 11:43:12 EDT
Rich you are correct, the attached log entries are raw STDOUT logs from a pod/container. I got them from the OpenShift web console under the pod log tab.

I don't have access to the node or infra namespaces so I can't provide more info on the fluentd configs or logs.
Comment 6 Jeff Cantrill 2017-09-11 14:52:05 EDT
Rich,

Are you suggesting we are getting a log entry with the message removed or we get no log entry at all?
Comment 7 Rich Megginson 2017-09-11 15:11:53 EDT
(In reply to Jeff Cantrill from comment #6)
> Rich,
> 
> Are you suggesting we are getting a log entry with the message removed or we
> get no log entry at all?

We get a log entry with no "MESSAGE", "message", or "log" field.  We need to figure out what to do in that case, since "message" is required as part of the common data model.

- we could stuff the original un-JSON-parsed message in that field (but there is a field in pipeline_metadata for that)
- we could stuff a value like "nil" or "empty" or "None" or whatever into that field
- we could relax the constraint that the "message" field is required, and allow storage of log records with no "message" field
Comment 8 Peter Portante 2017-09-12 07:10:50 EDT
(In reply to Rich Megginson from comment #7)
> We get a log entry with no "MESSAGE", "message", or "log" field.  We need to
> figure out what to do in that case, since "message" is required as part of
> the common data model.

I am not sure that there should be a requirement for a field to be present, except @timestamp.  If we have parts of the system that "require" a field, it shouldn't be Elasticsearch, and we should work to make sure that requirement for a field is removed.
 
> - we could stuff the original un-JSON-parsed message in that field (but
> there is a field in pipeline_metadata for that)

-1

> - we could stuff a value like "nil" or "empty" or "None" or whatever into
> that field

+1

I believe Elasticsearch recognizes "nil" JSON values and properly handles them.

> - we could relax the constraint that the "message" field is required, and
> allow storage of log records with no "message" field

+1
Comment 9 Jeff Cantrill 2017-09-21 18:31:38 EDT
*** Bug 1458418 has been marked as a duplicate of this bug. ***
Comment 10 jchevret 2017-09-29 09:54:28 EDT
(In reply to Peter Portante from comment #8)
> (In reply to Rich Megginson from comment #7)
> > We get a log entry with no "MESSAGE", "message", or "log" field.  We need to
> > figure out what to do in that case, since "message" is required as part of
> > the common data model.
> 
> I am not sure that there should be a requirement for a field to be present,
> except @timestamp.  If we have parts of the system that "require" a field,
> it shouldn't be Elasticsearch, and we should work to make sure that
> requirement for a field is removed.

I'm not sure I understand why we came to the conclusion this is caused by a log entry with no "message" or "log field. In my case, some JSON log entries are indexed fine and others are not. All entries are of the same format as in they are JSON objects with the same structure. None of them have a 'message' or 'log' field.
Comment 11 Rich Megginson 2017-09-29 11:15:30 EDT
(In reply to jchevret from comment #10)
> (In reply to Peter Portante from comment #8)
> > (In reply to Rich Megginson from comment #7)
> > > We get a log entry with no "MESSAGE", "message", or "log" field.  We need to
> > > figure out what to do in that case, since "message" is required as part of
> > > the common data model.
> > 
> > I am not sure that there should be a requirement for a field to be present,
> > except @timestamp.  If we have parts of the system that "require" a field,
> > it shouldn't be Elasticsearch, and we should work to make sure that
> > requirement for a field is removed.
> 
> I'm not sure I understand why we came to the conclusion this is caused by a
> log entry with no "message" or "log field. In my case, some JSON log entries
> are indexed fine and others are not. All entries are of the same format as
> in they are JSON objects with the same structure. None of them have a
> 'message' or 'log' field.

The bug says that some of the records are not indexed - namely, the records posted in the attachment - https://bugzilla.redhat.com/attachment.cgi?id=1322238 - the thing that the records in the attachment have in common is that they have no "message" or "MESSAGE" or "log" fields which fluentd looks for.

If this is not the problem - then what is in common between the records that are indexed and stored in Elasticsearch vs. the records that are not?
Comment 22 Jeff Cantrill 2018-01-19 11:20:34 EST
The configurations have changed in current release and this may no longer be an issue.  We are unable to reproduce.  Please reopen if issue arises again

Note You need to log in before you can comment on or make changes to this bug.