Bug 1685243 - Allow MERGE_JSON_LOG=true for indexing of JSON payload fields
Summary: Allow MERGE_JSON_LOG=true for indexing of JSON payload fields
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.11.z
Assignee: Rich Megginson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: 1686946 1686947
TreeView+ depends on / blocked
 
Reported: 2019-03-04 18:18 UTC by Daniel Del Ciancio
Modified: 2019-06-11 15:08 UTC (History)
5 users (show)

Fixed In Version: ose-logging-fluentd:v3.11.97-1
Doc Type: Bug Fix
Doc Text:
Cause: Using MERGE_JSON_LOG=true can create fields in the record which will cause schema and syntax violations in Elasticsearch. It can also create too many fields for Elasticsearch to handle without severe performance problems. Consequence: Fluentd reports error 400 sending records to Elasticsearch. Elasticsearch performance degrades. Fix: Allow users who experience these problems to tune their Fluentd to accomodate their log record fields. Result: Logs are ingested into Fluentd with no errors. Elasticsearch performance does not degrade.
Clone Of:
: 1686946 (view as bug list)
Environment:
Last Closed: 2019-04-11 05:38:34 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 1554 'None' closed Bug 1685243 - Allow MERGE_JSON_LOG=true for indexing of JSON payload fields 2020-02-27 11:12:44 UTC
Red Hat Product Errata RHBA-2019:0636 None None None 2019-04-11 05:38:43 UTC

Description Daniel Del Ciancio 2019-03-04 18:18:26 UTC
Description of problem:

Since the 3.11 upgrade, the MERGE_JSON_LOG was disabled to avoid issues described here:  https://github.com/openshift/origin-aggregated-logging/issues/1492.
Having disabled this no longer allows the customer to index on fields contained in the JSON_PAYLOAD.
I've been informed that re-enabling this can cause unpredictable results.
Is there another option available to the customer?  They are looking to have this functionality restored.


Version-Release number of selected component (if applicable):
OCP 3.11-43

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Customer should be able to select the fields contained in the JSON_PAYLOAD and filter on these.

Additional info:

Comment 3 Rich Megginson 2019-03-04 18:40:42 UTC
We have a partial solution in https://github.com/ViaQ/fluent-plugin-viaq_data_model/commit/8b5ef11cedec4c372b2cb082afc7f9cc08473654 and https://github.com/ViaQ/fluent-plugin-viaq_data_model/commit/d204d2fe732fc26201302c8fa466f20cc335e517

I would encourage you to read the README about undefined field handling and why we had to set MERGE_JSON_LOG=false by default, because of the serious problems which result.

The shared environments such as Online and Dedicated are difficult to handle.  Not only are those the environments where we see the most problems with MERGE_JSON_LOG=true because of the many shared applications logging conflicting fields, but also those are the most difficult to remediate.  Setting MERGE_JSON_LOG affects _all_ customers and _all_ applications.  What we really need is the ability to set MERGE_JSON_LOG on a per-customer _and_ per-application basis.  We also need the ability to configure the viaq undefined field handling on a per-customer _and_ per-application basis.  The problem is that we have no way to tell fluentd about customers and applications.

One possible solution is that the customer would be responsible for labeling or annotating their pods and namespaces with MERGE_JSON_LOG, and then with specific viaq filter settings for MERGE_JSON_LOG=true.  This would require quite a bit of work in the viaq filter and in the logging fluentd configuration, and would likely have a negative impact on performance.

Comment 4 Daniel Del Ciancio 2019-03-04 20:31:24 UTC
Hi Rich,
I realize your proposed solution involves installing a fluentd plugin and configuring specific ViaQ filter settings, however, I'm not sure the SRE team will allow this as they generally discourage managing snowflake clusters.  

That being said, what other options exist for the customer aside from enabling the MERGE_JSON_LOG parameter and running into data and/or performance issues.  They are running on a dedicated cluster and are the only consumers of it.  All applications running on that cluster belong to the same customer.

Comment 5 Rich Megginson 2019-03-04 21:36:40 UTC
(In reply to Daniel Del Ciancio from comment #4)
> Hi Rich,
> I realize your proposed solution involves installing a fluentd plugin and
> configuring specific ViaQ filter settings, however, I'm not sure the SRE
> team will allow this as they generally discourage managing snowflake
> clusters.  
> 
> That being said, what other options exist for the customer aside from
> enabling the MERGE_JSON_LOG parameter and running into data and/or
> performance issues.  They are running on a dedicated cluster and are the
> only consumers of it.  All applications running on that cluster belong to
> the same customer.

Then I think it should be safe to use MERGE_JSON_LOG=true in this case.  But the customer will need to closely monitor their fluentd logs to make sure they don't see any strange errors, which would be caused by the undefined fields created by MERGE_JSON_LOG=true.

Comment 6 Daniel Del Ciancio 2019-03-04 22:11:32 UTC
(In reply to Rich Megginson from comment #5)
> (In reply to Daniel Del Ciancio from comment #4)
> > Hi Rich,
> > I realize your proposed solution involves installing a fluentd plugin and
> > configuring specific ViaQ filter settings, however, I'm not sure the SRE
> > team will allow this as they generally discourage managing snowflake
> > clusters.  
> > 
> > That being said, what other options exist for the customer aside from
> > enabling the MERGE_JSON_LOG parameter and running into data and/or
> > performance issues.  They are running on a dedicated cluster and are the
> > only consumers of it.  All applications running on that cluster belong to
> > the same customer.
> 
> Then I think it should be safe to use MERGE_JSON_LOG=true in this case.  But
> the customer will need to closely monitor their fluentd logs to make sure
> they don't see any strange errors, which would be caused by the undefined
> fields created by MERGE_JSON_LOG=true.

As you're aware, I've raised this to the OSD BU (and CC'ed you and Jeff on that email).  I suspect that if this parameter was to be re-enabled, then the SRE team could implement some monitoring script that could scan the fluentd logs for any "undefined field" errors.  There would also need to be some procedure in place to be able to deal with these errors should they arise.

I'll have the BU decide on the next steps.

Thanks!

Comment 9 Rich Megginson 2019-03-13 02:00:58 UTC
The fix is implemented in rubygem-fluent-plugin-viaq_data_model-0.0.18-1.el7 - please verify that the fluentd image has this version (or later).

Comment 17 errata-xmlrpc 2019-04-11 05:38:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636


Note You need to log in before you can comment on or make changes to this bug.