Bug 1685243

Summary: Allow MERGE_JSON_LOG=true for indexing of JSON payload fields
Product: OpenShift Container Platform Reporter: Daniel Del Ciancio <ddelcian>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jgoulding, pstrick, qitang, rmeggins
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ose-logging-fluentd:v3.11.97-1 Doc Type: Bug Fix
Doc Text:
Cause: Using MERGE_JSON_LOG=true can create fields in the record which will cause schema and syntax violations in Elasticsearch. It can also create too many fields for Elasticsearch to handle without severe performance problems. Consequence: Fluentd reports error 400 sending records to Elasticsearch. Elasticsearch performance degrades. Fix: Allow users who experience these problems to tune their Fluentd to accomodate their log record fields. Result: Logs are ingested into Fluentd with no errors. Elasticsearch performance does not degrade.
Story Points: ---
Clone Of:
: 1686946 (view as bug list) Environment:
Last Closed: 2019-04-11 05:38:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1686946, 1686947    

Description Daniel Del Ciancio 2019-03-04 18:18:26 UTC
Description of problem:

Since the 3.11 upgrade, the MERGE_JSON_LOG was disabled to avoid issues described here:  https://github.com/openshift/origin-aggregated-logging/issues/1492.
Having disabled this no longer allows the customer to index on fields contained in the JSON_PAYLOAD.
I've been informed that re-enabling this can cause unpredictable results.
Is there another option available to the customer?  They are looking to have this functionality restored.


Version-Release number of selected component (if applicable):
OCP 3.11-43

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Customer should be able to select the fields contained in the JSON_PAYLOAD and filter on these.

Additional info:

Comment 3 Rich Megginson 2019-03-04 18:40:42 UTC
We have a partial solution in https://github.com/ViaQ/fluent-plugin-viaq_data_model/commit/8b5ef11cedec4c372b2cb082afc7f9cc08473654 and https://github.com/ViaQ/fluent-plugin-viaq_data_model/commit/d204d2fe732fc26201302c8fa466f20cc335e517

I would encourage you to read the README about undefined field handling and why we had to set MERGE_JSON_LOG=false by default, because of the serious problems which result.

The shared environments such as Online and Dedicated are difficult to handle.  Not only are those the environments where we see the most problems with MERGE_JSON_LOG=true because of the many shared applications logging conflicting fields, but also those are the most difficult to remediate.  Setting MERGE_JSON_LOG affects _all_ customers and _all_ applications.  What we really need is the ability to set MERGE_JSON_LOG on a per-customer _and_ per-application basis.  We also need the ability to configure the viaq undefined field handling on a per-customer _and_ per-application basis.  The problem is that we have no way to tell fluentd about customers and applications.

One possible solution is that the customer would be responsible for labeling or annotating their pods and namespaces with MERGE_JSON_LOG, and then with specific viaq filter settings for MERGE_JSON_LOG=true.  This would require quite a bit of work in the viaq filter and in the logging fluentd configuration, and would likely have a negative impact on performance.

Comment 4 Daniel Del Ciancio 2019-03-04 20:31:24 UTC
Hi Rich,
I realize your proposed solution involves installing a fluentd plugin and configuring specific ViaQ filter settings, however, I'm not sure the SRE team will allow this as they generally discourage managing snowflake clusters.  

That being said, what other options exist for the customer aside from enabling the MERGE_JSON_LOG parameter and running into data and/or performance issues.  They are running on a dedicated cluster and are the only consumers of it.  All applications running on that cluster belong to the same customer.

Comment 5 Rich Megginson 2019-03-04 21:36:40 UTC
(In reply to Daniel Del Ciancio from comment #4)
> Hi Rich,
> I realize your proposed solution involves installing a fluentd plugin and
> configuring specific ViaQ filter settings, however, I'm not sure the SRE
> team will allow this as they generally discourage managing snowflake
> clusters.  
> 
> That being said, what other options exist for the customer aside from
> enabling the MERGE_JSON_LOG parameter and running into data and/or
> performance issues.  They are running on a dedicated cluster and are the
> only consumers of it.  All applications running on that cluster belong to
> the same customer.

Then I think it should be safe to use MERGE_JSON_LOG=true in this case.  But the customer will need to closely monitor their fluentd logs to make sure they don't see any strange errors, which would be caused by the undefined fields created by MERGE_JSON_LOG=true.

Comment 6 Daniel Del Ciancio 2019-03-04 22:11:32 UTC
(In reply to Rich Megginson from comment #5)
> (In reply to Daniel Del Ciancio from comment #4)
> > Hi Rich,
> > I realize your proposed solution involves installing a fluentd plugin and
> > configuring specific ViaQ filter settings, however, I'm not sure the SRE
> > team will allow this as they generally discourage managing snowflake
> > clusters.  
> > 
> > That being said, what other options exist for the customer aside from
> > enabling the MERGE_JSON_LOG parameter and running into data and/or
> > performance issues.  They are running on a dedicated cluster and are the
> > only consumers of it.  All applications running on that cluster belong to
> > the same customer.
> 
> Then I think it should be safe to use MERGE_JSON_LOG=true in this case.  But
> the customer will need to closely monitor their fluentd logs to make sure
> they don't see any strange errors, which would be caused by the undefined
> fields created by MERGE_JSON_LOG=true.

As you're aware, I've raised this to the OSD BU (and CC'ed you and Jeff on that email).  I suspect that if this parameter was to be re-enabled, then the SRE team could implement some monitoring script that could scan the fluentd logs for any "undefined field" errors.  There would also need to be some procedure in place to be able to deal with these errors should they arise.

I'll have the BU decide on the next steps.

Thanks!

Comment 9 Rich Megginson 2019-03-13 02:00:58 UTC
The fix is implemented in rubygem-fluent-plugin-viaq_data_model-0.0.18-1.el7 - please verify that the fluentd image has this version (or later).

Comment 17 errata-xmlrpc 2019-04-11 05:38:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636