Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1591468

Summary:	fluentd unknown property id fills up buffer
Product:	OpenShift Container Platform	Reporter:	Steven Walter <stwalter>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Anping Li <anli>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, jack.ottofaro, jcantril, jmalde, mjahangi, rmeggins, stwalter
Target Milestone:	---
Target Release:	3.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-29 17:59:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steven Walter 2018-06-14 19:29:13 UTC

Description of problem:
Fluentd fills up buffer when it receives logs from a specific application. It spits errors about unknown property "id".

Version-Release number of selected component (if applicable):
openshift3-logging-fluentd-v3.9.27-1

How reproducible:
Unconfirmed

2018-06-14 14:33:21 +0200 [debug]: Elasticsearch errors returned, retrying:  {"took"=>2, "errors"=>true, "items"=>[{"create"=>{"_index"=>"project.test-prod.286e429c-8efd-11e6-b4e5-0050569a3c52.2018.06.14", "_type"=>"com.redhat.viaq.common", "_id"=>"ZmU3YTE5NzYtMzIwZi00Njg1LTlmZTMtOTViN2Q1YjU4NzVi", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [res.body]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [id]"}}}}, . . . "status"=>409, "error"=>{"type"=>"document_already_exists_exception", "reason"=>"[com.redhat.viaq.common][NTdhMGVhY2QtYTVmYy00MzQyLTk3YTQtZjJjMWU2YzQxYjBi]: document already exists", "shard"=>"0", . . .

2018-06-14 14:33:21 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2018-06-14 14:38:21 +0200 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="Elasticsearch returned errors, retrying. Add '@log_level debug' to your config to see the full response" plugin_id="object:3ff3d4816f0c"
  2018-06-14 14:33:21 +0200 [warn]: suppressed same stacktrace

Comment 3 Rich Megginson 2018-06-14 20:06:25 UTC

I don't see any field "id" anywhere.  Need to see the es logs for more details.

Can we get the log files from inside the container?  Or did you get an error message like this when you ran the logging-dump.sh command:

--- Unable to get ES logs from pod $pod

If so - you'll have to collect these manually

Comment 7 Jeff Cantrill 2018-06-15 14:45:10 UTC

Looking through the attachments I find no trace of any significant errors related to #c1.  There are places where it looks like maybe the networking dropped or the pods were restarted as there are several errors of no detected master.

Also,openshift3-logging-fluentd-v3.9.27-1 does NOT include various changes we made to:

* Fix bulk insert error handling
* Message file buffer deserialization error handling
* retry failed messages 

This may explain the 'document already exists' exception if its retrying a record it already pushed.

logging-fluentd-container-v3.9.30-1 is the first image with these changes which are in the container catalog.  I suggest pulling the new image and going from there.

Comment 9 Jeff Cantrill 2018-06-15 15:03:55 UTC

3.9.30-2 is a second or later build of 3.9.30-1 which is sufficient

Comment 10 Rich Megginson 2018-06-15 15:59:15 UTC

The error is this:

2018-06-14 14:33:21 +0200 [debug]: Elasticsearch errors returned, retrying:  {"took"=>2, "errors"=>true, "items"=>[{"create"=>{"_index"=>"project.test-prod.286e429c-8efd-11e6-b4e5-0050569a3c52.2018.06.14", "_type"=>"com.redhat.viaq.common", "_id"=>"ZmU3YTE5NzYtMzIwZi00Njg1LTlmZTMtOTViN2Q1YjU4NzVi", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [res.body]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [id]"}}}}

From what google tells me https://github.com/elastic/elasticsearch/issues/2898 it would appear that the user is sending data containing an "id" field that does not map to a String value e.g. it is a JSON int e.g.

"id": 12345,

or some other compound value e.g.

"id": {"field1":"val1","field2":"val2"},

fluentd doesn't log the actual request though, so we can't tell from the pod logs, we'll need to look at the source (journald or json-file) logs.

If using docker --log-driver=json-file, try this:

sudo grep '\\"id\\"' /var/log/containers/test-prod*

If using --log-driver=journald, then perhaps try

sudo journalctl -o export | grep '\\"id\\"'

Comment 11 Steven Walter 2018-06-20 18:11:29 UTC

First case closed as issue is no longer occurring with latest images. I imagine the fixes in later 3.9 images fix the issue.

Comment 13 Jeff Cantrill 2018-06-29 17:59:28 UTC

Closing this issue per #c11.  Regarding #c12 we will likely not be backporting to 3.6 but the changes do exist in 3.7.56 or better