Bug 1482002 - Can't collect log entries due to fluentd error
Summary: Can't collect log entries due to fluentd error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.6.z
Assignee: Rich Megginson
QA Contact: Xia Zhao
URL:
Whiteboard:
: 1482532 (view as bug list)
Depends On:
Blocks: 1469859 1498999
TreeView+ depends on / blocked
 
Reported: 2017-08-16 09:16 UTC by Xia Zhao
Modified: 2017-10-25 13:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Fluentd could not write the files it uses for buffering records due to a problem converting values from ascii-8bit to utf-8. Consequence: Fluentd emits a lot of errors and cannot add records to Elasticsearch. Fix: Remove the patch that forced the utf-8 conversion. Result: Fluentd can write ascii-8bit encoded files for its buffer.
Clone Of:
: 1498999 (view as bug list)
Environment:
Last Closed: 2017-10-25 13:04:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
inventory file used for logging deployment (714 bytes, text/plain)
2017-08-16 09:20 UTC, Xia Zhao
no flags Details
fluentd log (4.82 MB, text/plain)
2017-08-16 09:23 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Description Xia Zhao 2017-08-16 09:16:35 UTC
Description of problem:
Deploy logging 3.6 stacks on OCP 3.6, can't collect log entries due to the following fluentd error:
2017-08-16 04:52:01 -0400 [error]: Exception emitting record: "\x92" from ASCII-8BIT to UTF-8
2017-08-16 04:52:01 -0400 [warn]: emit transaction failed: error_class=Encoding::UndefinedConversionError error="\"\\x92\" from ASCII-8BIT to UTF-8" tag="journal.system"

Issue repro regardless of whether this parameter is specified in inventory: 
openshift_logging_fluentd_journal_read_from_head=true

[error]: Exception emitting record: "\x92" from ASCII-8BIT to UTF-8

Version-Release number of selected component (if applicable):
logging-fluentd         v3.6.173.0.5-6      95dede9f3cb2        9 hours ago         235.1 MB

# openshift version
openshift v3.6.173.0.5
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

How reproducible:
Always

Steps to Reproduce:
1.Deploy logging 3.6 stacks on OCP 3.6 with the attached inventory file
2.Wait until EFK pods are running
3.Check fluentd logs

Actual results:
Can't collect log entries due to fluentd error

Expected results:
fluentd should work

Additional info:
full log of fluentd attached
inventory file of logging deployment attached

Comment 1 Xia Zhao 2017-08-16 09:20:07 UTC
Created attachment 1314021 [details]
inventory file used for logging deployment

Comment 2 Xia Zhao 2017-08-16 09:23:09 UTC
Created attachment 1314027 [details]
fluentd log

Comment 3 Xia Zhao 2017-08-16 09:23:55 UTC
This bz is currently blocking logging tests on OCP 3.6.0 envs.

Comment 4 Jan Wozniak 2017-08-16 14:45:53 UTC
the fluentd image in brew looks suspiciously small compared to the one freshly build from the branch rhaos-3.6-rhel-7. Perhaps incorrect build got pushed to brew?

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd         v3.6                95dede9f3cb2        15 hours ago        235.1 MB
local-reg:5000/openshift/logging-fluentd                                                <none>              0ac973960bdb        4 hours ago         360.6 MB

Comment 5 Rich Megginson 2017-08-16 15:09:27 UTC
Can we log into the system? I want to look at the journal and see if I can find which record is causing this problem.

Comment 10 Rich Megginson 2017-08-17 13:51:11 UTC
*** Bug 1482532 has been marked as a duplicate of this bug. ***

Comment 11 Mike Fiedler 2017-08-17 14:21:12 UTC
Installing logging with openshift_logging_image_version=v3.6.173.0.5 - this problem is seen.

Installing logging with openshift_logging_image_version=v3.6.171 - this problem is NOT seen.

Comment 12 Noriko Hosoi 2017-08-17 16:37:06 UTC
(In reply to Mike Fiedler from comment #11)
> Installing logging with openshift_logging_image_version=v3.6.173.0.5 - this
> problem is seen.
> 
> Installing logging with openshift_logging_image_version=v3.6.171 - this
> problem is NOT seen.

Right.  Switching the buffer_type from "memory" to "file" happened after the version was bumped to v3.6.171.

Comment 14 Rich Megginson 2017-08-18 02:24:50 UTC
This is easy to reproduce with flexy.

I'm thinking that the fluentd dependencies are conflicting - they are not up to date - once the 3.6 puddle is rebuilt I can build and test a new fluentd image.

Comment 16 Rich Megginson 2017-08-18 18:09:20 UTC
(In reply to Rich Megginson from comment #14)
> This is easy to reproduce with flexy.
> 
> I'm thinking that the fluentd dependencies are conflicting - they are not up
> to date - once the 3.6 puddle is rebuilt I can build and test a new fluentd
> image.

This did not help :-(

Now resorting to debugging the ruby code . . .

Comment 17 Rich Megginson 2017-08-18 21:44:00 UTC
The bug was introduced in logging-fluentd:v3.6.173.0.5-6 - logging-fluentd:v3.6.173.0.5-5 and earlier work.

These are the commits between -5 and -6:

http://pkgs.devel.redhat.com/cgit/rpms/logging-fluentd-docker/log/?h=rhaos-3.6-rhel-7

Impl fluentd file buffer.
remove USE_MUX_CLIENT; mux service always check for k8s metadata
fluentd 0.12.39; k8s filter 0.28.0; viaq 0.0.5

Comment 18 Rich Megginson 2017-08-19 01:08:33 UTC
The error doesn't appear to be related to systemd input or elasticsearch output - I tried fluentd secure_forward with file buffer -> mux es with file buffer

in both fluentd and mux I see the conversion error.  So it must have something to do with file buffer, but I just don't know what it could be.

Comment 22 Xia Zhao 2017-08-24 02:45:45 UTC
Thanks, the fluentd is now back, checked with fluentd:v3.6.173.0.5-10 image that log entries can be collected and reflect on kibana. Set to verified.

Image verified with:
logging-fluentd   v3.6.173.0.5-10     58ab4badc0b7        6 hours ago         235.1 MB

Comment 24 errata-xmlrpc 2017-10-25 13:04:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.