Bug 1591468
| Summary: | fluentd unknown property id fills up buffer | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> |
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Anping Li <anli> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.9.0 | CC: | aos-bugs, jack.ottofaro, jcantril, jmalde, mjahangi, rmeggins, stwalter |
| Target Milestone: | --- | ||
| Target Release: | 3.9.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-06-29 17:59:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I don't see any field "id" anywhere. Need to see the es logs for more details. Can we get the log files from inside the container? Or did you get an error message like this when you ran the logging-dump.sh command: --- Unable to get ES logs from pod $pod If so - you'll have to collect these manually Looking through the attachments I find no trace of any significant errors related to #c1. There are places where it looks like maybe the networking dropped or the pods were restarted as there are several errors of no detected master. Also,openshift3-logging-fluentd-v3.9.27-1 does NOT include various changes we made to: * Fix bulk insert error handling * Message file buffer deserialization error handling * retry failed messages This may explain the 'document already exists' exception if its retrying a record it already pushed. logging-fluentd-container-v3.9.30-1 is the first image with these changes which are in the container catalog. I suggest pulling the new image and going from there. 3.9.30-2 is a second or later build of 3.9.30-1 which is sufficient The error is this:
2018-06-14 14:33:21 +0200 [debug]: Elasticsearch errors returned, retrying: {"took"=>2, "errors"=>true, "items"=>[{"create"=>{"_index"=>"project.test-prod.286e429c-8efd-11e6-b4e5-0050569a3c52.2018.06.14", "_type"=>"com.redhat.viaq.common", "_id"=>"ZmU3YTE5NzYtMzIwZi00Njg1LTlmZTMtOTViN2Q1YjU4NzVi", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [res.body]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [id]"}}}}
From what google tells me https://github.com/elastic/elasticsearch/issues/2898 it would appear that the user is sending data containing an "id" field that does not map to a String value e.g. it is a JSON int e.g.
"id": 12345,
or some other compound value e.g.
"id": {"field1":"val1","field2":"val2"},
fluentd doesn't log the actual request though, so we can't tell from the pod logs, we'll need to look at the source (journald or json-file) logs.
If using docker --log-driver=json-file, try this:
sudo grep '\\"id\\"' /var/log/containers/test-prod*
If using --log-driver=journald, then perhaps try
sudo journalctl -o export | grep '\\"id\\"'
First case closed as issue is no longer occurring with latest images. I imagine the fixes in later 3.9 images fix the issue. Closing this issue per #c11. Regarding #c12 we will likely not be backporting to 3.6 but the changes do exist in 3.7.56 or better |
Description of problem: Fluentd fills up buffer when it receives logs from a specific application. It spits errors about unknown property "id". Version-Release number of selected component (if applicable): openshift3-logging-fluentd-v3.9.27-1 How reproducible: Unconfirmed 2018-06-14 14:33:21 +0200 [debug]: Elasticsearch errors returned, retrying: {"took"=>2, "errors"=>true, "items"=>[{"create"=>{"_index"=>"project.test-prod.286e429c-8efd-11e6-b4e5-0050569a3c52.2018.06.14", "_type"=>"com.redhat.viaq.common", "_id"=>"ZmU3YTE5NzYtMzIwZi00Njg1LTlmZTMtOTViN2Q1YjU4NzVi", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [res.body]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [id]"}}}}, . . . "status"=>409, "error"=>{"type"=>"document_already_exists_exception", "reason"=>"[com.redhat.viaq.common][NTdhMGVhY2QtYTVmYy00MzQyLTk3YTQtZjJjMWU2YzQxYjBi]: document already exists", "shard"=>"0", . . . 2018-06-14 14:33:21 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2018-06-14 14:38:21 +0200 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="Elasticsearch returned errors, retrying. Add '@log_level debug' to your config to see the full response" plugin_id="object:3ff3d4816f0c" 2018-06-14 14:33:21 +0200 [warn]: suppressed same stacktrace