Bug 1478613

Summary: Fluentd appears unable or unwilling to index certain logs into Elasticsearch
Product: OpenShift Container Platform Reporter: Peter Portante <pportant>
Component: LoggingAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: aos-bugs, jcantril, jchevret, msrb, nhosoi, pportant, rfoyle, rmeggins, wsun
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-19 16:20:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Missing log entries none

Description Peter Portante 2017-08-05 03:43:53 UTC
It appears that some part of the log collection pipeline is unable or unwilling to index certain log lines in a file.

The data contains sensitive information, so it won't be posted publicly in this description.

The OpenShift Dedicated team is managing a cluster recently upgraded to 3.5 exhibiting this behavior.  We have checked the operational logs of all fluentd pods and Elasticsearch without seeing any errors.

Comment 1 Jeff Cantrill 2017-08-24 14:25:17 UTC
Can you provide more information for us to investigate?  Can you possibly add them as a private comment or ref an attachment and redact the sensitive info?

Comment 2 jchevret 2017-09-05 13:35:42 UTC
Created attachment 1322238 [details]
Missing log entries

Attaching two log entries which can be found in the OpenShift console under the pod's Log tab. However the same two entries cannot be found in kibana (search term "PHR91lbiNh-116033")

Other log entries with the same JSON format & fields can be found so it's likely not an issue with the JSON object or format per se.

Comment 3 Rich Megginson 2017-09-05 13:48:36 UTC
Are there any errors in the fluentd or elasticsearch pod logs related to these messages?

Comment 4 Rich Megginson 2017-09-05 15:04:03 UTC
I think one of the problems is that the final version of the record has no "message" or "log" field:

https://github.com/openshift/origin-aggregated-logging/blob/release-1.5/fluentd/configs.d/openshift/filter-k8s-record-transform.conf#L28

I'm assuming these messages are coming in from a pod/container log.  I'm also assuming that the attached messages are the unescaped JSON value from the "log" field from the actual container json-file, or the MESSAGE field for the journal record.

In this case - the record will come in with a "log" or "MESSAGE" field, and will be parsed here: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L335

This will parse the embedded JSON string value, set all of those fields to be top level fields in the record, and remove the "log" or "MESSAGE" field from the record.

The record ends up with no message field.

Comment 5 jchevret 2017-09-05 15:43:12 UTC
Rich you are correct, the attached log entries are raw STDOUT logs from a pod/container. I got them from the OpenShift web console under the pod log tab.

I don't have access to the node or infra namespaces so I can't provide more info on the fluentd configs or logs.

Comment 6 Jeff Cantrill 2017-09-11 18:52:05 UTC
Rich,

Are you suggesting we are getting a log entry with the message removed or we get no log entry at all?

Comment 7 Rich Megginson 2017-09-11 19:11:53 UTC
(In reply to Jeff Cantrill from comment #6)
> Rich,
> 
> Are you suggesting we are getting a log entry with the message removed or we
> get no log entry at all?

We get a log entry with no "MESSAGE", "message", or "log" field.  We need to figure out what to do in that case, since "message" is required as part of the common data model.

- we could stuff the original un-JSON-parsed message in that field (but there is a field in pipeline_metadata for that)
- we could stuff a value like "nil" or "empty" or "None" or whatever into that field
- we could relax the constraint that the "message" field is required, and allow storage of log records with no "message" field

Comment 8 Peter Portante 2017-09-12 11:10:50 UTC
(In reply to Rich Megginson from comment #7)
> We get a log entry with no "MESSAGE", "message", or "log" field.  We need to
> figure out what to do in that case, since "message" is required as part of
> the common data model.

I am not sure that there should be a requirement for a field to be present, except @timestamp.  If we have parts of the system that "require" a field, it shouldn't be Elasticsearch, and we should work to make sure that requirement for a field is removed.
 
> - we could stuff the original un-JSON-parsed message in that field (but
> there is a field in pipeline_metadata for that)

-1

> - we could stuff a value like "nil" or "empty" or "None" or whatever into
> that field

+1

I believe Elasticsearch recognizes "nil" JSON values and properly handles them.

> - we could relax the constraint that the "message" field is required, and
> allow storage of log records with no "message" field

+1

Comment 9 Jeff Cantrill 2017-09-21 22:31:38 UTC
*** Bug 1458418 has been marked as a duplicate of this bug. ***

Comment 10 jchevret 2017-09-29 13:54:28 UTC
(In reply to Peter Portante from comment #8)
> (In reply to Rich Megginson from comment #7)
> > We get a log entry with no "MESSAGE", "message", or "log" field.  We need to
> > figure out what to do in that case, since "message" is required as part of
> > the common data model.
> 
> I am not sure that there should be a requirement for a field to be present,
> except @timestamp.  If we have parts of the system that "require" a field,
> it shouldn't be Elasticsearch, and we should work to make sure that
> requirement for a field is removed.

I'm not sure I understand why we came to the conclusion this is caused by a log entry with no "message" or "log field. In my case, some JSON log entries are indexed fine and others are not. All entries are of the same format as in they are JSON objects with the same structure. None of them have a 'message' or 'log' field.

Comment 11 Rich Megginson 2017-09-29 15:15:30 UTC
(In reply to jchevret from comment #10)
> (In reply to Peter Portante from comment #8)
> > (In reply to Rich Megginson from comment #7)
> > > We get a log entry with no "MESSAGE", "message", or "log" field.  We need to
> > > figure out what to do in that case, since "message" is required as part of
> > > the common data model.
> > 
> > I am not sure that there should be a requirement for a field to be present,
> > except @timestamp.  If we have parts of the system that "require" a field,
> > it shouldn't be Elasticsearch, and we should work to make sure that
> > requirement for a field is removed.
> 
> I'm not sure I understand why we came to the conclusion this is caused by a
> log entry with no "message" or "log field. In my case, some JSON log entries
> are indexed fine and others are not. All entries are of the same format as
> in they are JSON objects with the same structure. None of them have a
> 'message' or 'log' field.

The bug says that some of the records are not indexed - namely, the records posted in the attachment - https://bugzilla.redhat.com/attachment.cgi?id=1322238 - the thing that the records in the attachment have in common is that they have no "message" or "MESSAGE" or "log" fields which fluentd looks for.

If this is not the problem - then what is in common between the records that are indexed and stored in Elasticsearch vs. the records that are not?

Comment 22 Jeff Cantrill 2018-01-19 16:20:34 UTC
The configurations have changed in current release and this may no longer be an issue.  We are unable to reproduce.  Please reopen if issue arises again