Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1482002 - Can't collect log entries due to fluentd error
Can't collect log entries due to fluentd error
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.6.0
Unspecified Unspecified
high Severity high
: ---
: 3.6.z
Assigned To: Rich Megginson
Xia Zhao
: Regression
: 1482532 (view as bug list)
Depends On:
Blocks: 1469859 1498999
  Show dependency treegraph
 
Reported: 2017-08-16 05:16 EDT by Xia Zhao
Modified: 2017-10-25 09:04 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Fluentd could not write the files it uses for buffering records due to a problem converting values from ascii-8bit to utf-8. Consequence: Fluentd emits a lot of errors and cannot add records to Elasticsearch. Fix: Remove the patch that forced the utf-8 conversion. Result: Fluentd can write ascii-8bit encoded files for its buffer.
Story Points: ---
Clone Of:
: 1498999 (view as bug list)
Environment:
Last Closed: 2017-10-25 09:04:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
inventory file used for logging deployment (714 bytes, text/plain)
2017-08-16 05:20 EDT, Xia Zhao
no flags Details
fluentd log (4.82 MB, text/plain)
2017-08-16 05:23 EDT, Xia Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 11:57:15 EDT

  None (edit)
Description Xia Zhao 2017-08-16 05:16:35 EDT
Description of problem:
Deploy logging 3.6 stacks on OCP 3.6, can't collect log entries due to the following fluentd error:
2017-08-16 04:52:01 -0400 [error]: Exception emitting record: "\x92" from ASCII-8BIT to UTF-8
2017-08-16 04:52:01 -0400 [warn]: emit transaction failed: error_class=Encoding::UndefinedConversionError error="\"\\x92\" from ASCII-8BIT to UTF-8" tag="journal.system"

Issue repro regardless of whether this parameter is specified in inventory: 
openshift_logging_fluentd_journal_read_from_head=true

[error]: Exception emitting record: "\x92" from ASCII-8BIT to UTF-8

Version-Release number of selected component (if applicable):
logging-fluentd         v3.6.173.0.5-6      95dede9f3cb2        9 hours ago         235.1 MB

# openshift version
openshift v3.6.173.0.5
kubernetes v1.6.1+5115d708d7
etcd 3.2.1

How reproducible:
Always

Steps to Reproduce:
1.Deploy logging 3.6 stacks on OCP 3.6 with the attached inventory file
2.Wait until EFK pods are running
3.Check fluentd logs

Actual results:
Can't collect log entries due to fluentd error

Expected results:
fluentd should work

Additional info:
full log of fluentd attached
inventory file of logging deployment attached
Comment 1 Xia Zhao 2017-08-16 05:20 EDT
Created attachment 1314021 [details]
inventory file used for logging deployment
Comment 2 Xia Zhao 2017-08-16 05:23 EDT
Created attachment 1314027 [details]
fluentd log
Comment 3 Xia Zhao 2017-08-16 05:23:55 EDT
This bz is currently blocking logging tests on OCP 3.6.0 envs.
Comment 4 Jan Wozniak 2017-08-16 10:45:53 EDT
the fluentd image in brew looks suspiciously small compared to the one freshly build from the branch rhaos-3.6-rhel-7. Perhaps incorrect build got pushed to brew?

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd         v3.6                95dede9f3cb2        15 hours ago        235.1 MB
local-reg:5000/openshift/logging-fluentd                                                <none>              0ac973960bdb        4 hours ago         360.6 MB
Comment 5 Rich Megginson 2017-08-16 11:09:27 EDT
Can we log into the system? I want to look at the journal and see if I can find which record is causing this problem.
Comment 10 Rich Megginson 2017-08-17 09:51:11 EDT
*** Bug 1482532 has been marked as a duplicate of this bug. ***
Comment 11 Mike Fiedler 2017-08-17 10:21:12 EDT
Installing logging with openshift_logging_image_version=v3.6.173.0.5 - this problem is seen.

Installing logging with openshift_logging_image_version=v3.6.171 - this problem is NOT seen.
Comment 12 Noriko Hosoi 2017-08-17 12:37:06 EDT
(In reply to Mike Fiedler from comment #11)
> Installing logging with openshift_logging_image_version=v3.6.173.0.5 - this
> problem is seen.
> 
> Installing logging with openshift_logging_image_version=v3.6.171 - this
> problem is NOT seen.

Right.  Switching the buffer_type from "memory" to "file" happened after the version was bumped to v3.6.171.
Comment 14 Rich Megginson 2017-08-17 22:24:50 EDT
This is easy to reproduce with flexy.

I'm thinking that the fluentd dependencies are conflicting - they are not up to date - once the 3.6 puddle is rebuilt I can build and test a new fluentd image.
Comment 16 Rich Megginson 2017-08-18 14:09:20 EDT
(In reply to Rich Megginson from comment #14)
> This is easy to reproduce with flexy.
> 
> I'm thinking that the fluentd dependencies are conflicting - they are not up
> to date - once the 3.6 puddle is rebuilt I can build and test a new fluentd
> image.

This did not help :-(

Now resorting to debugging the ruby code . . .
Comment 17 Rich Megginson 2017-08-18 17:44:00 EDT
The bug was introduced in logging-fluentd:v3.6.173.0.5-6 - logging-fluentd:v3.6.173.0.5-5 and earlier work.

These are the commits between -5 and -6:

http://pkgs.devel.redhat.com/cgit/rpms/logging-fluentd-docker/log/?h=rhaos-3.6-rhel-7

Impl fluentd file buffer.
remove USE_MUX_CLIENT; mux service always check for k8s metadata
fluentd 0.12.39; k8s filter 0.28.0; viaq 0.0.5
Comment 18 Rich Megginson 2017-08-18 21:08:33 EDT
The error doesn't appear to be related to systemd input or elasticsearch output - I tried fluentd secure_forward with file buffer -> mux es with file buffer

in both fluentd and mux I see the conversion error.  So it must have something to do with file buffer, but I just don't know what it could be.
Comment 22 Xia Zhao 2017-08-23 22:45:45 EDT
Thanks, the fluentd is now back, checked with fluentd:v3.6.173.0.5-10 image that log entries can be collected and reflect on kibana. Set to verified.

Image verified with:
logging-fluentd   v3.6.173.0.5-10     58ab4badc0b7        6 hours ago         235.1 MB
Comment 24 errata-xmlrpc 2017-10-25 09:04:36 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049

Note You need to log in before you can comment on or make changes to this bug.