Bug 1726639

Summary:	RFE: change level to 'unknown' for container logs
Product:	OpenShift Container Platform	Reporter:	Saurabh Sadhale <ssadhale>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.11.0	CC:	aos-bugs, jcantril, rdiscala, rmeggins, vjaypurk, wsun
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: The viaq plugin relied upon the stream to set level Consequence: Anything other then error or info was improperly represented Fix: Do not set level based on stream Result: The level is set to 'unkown' if it can not be propertly determined.	Story Points:	---
Clone Of:
Clones:	1732542 (view as bug list)		Environment:
Last Closed:	2019-10-16 06:33:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1732542

Comment 3 Rigel Di Scala 2019-07-15 15:19:20 UTC

This problem is not really due to an defect in Fluentd.

When OpenShift logs to Fluent, the default behaviour is to consider 
log messages written to stdout as INFO and stderr as ERROR.

The problem I see with this behaviour is that many programs log
diagnostic information, which would have log level INFO or lower, to
stderr, following the guidelines set out in the POSIX specification[1]
or the C stdlib[2]. Stdout is reserved for the normal output of the
program, which I believe would not contain any log messages, even in
the case of a web server or any other network deamon.

This is not an easy problem. To properly handle the log stream coming 
from OpenShift we would need to implement a custom parser for *each* 
type of application. However, the current situation generates a large
amount of noise that makes monitoring and interpreting the logs 
correctly quite hard. Anything at the ERROR level should be cause for
concern and addressed accordingly. A health probe succeeding is 
clearly not something that show up when scanning for these problems.


1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/stderr.html
2. https://www.gnu.org/software/libc/manual/html_node/Standard-Streams.html

Comment 7 Rich Megginson 2019-07-18 15:46:18 UTC

fluentd - fix is in https://github.com/ViaQ/fluent-plugin-viaq_data_model 0.0.19 https://rubygems.org/downloads/fluent-plugin-viaq_data_model-0.0.19.gem
next steps - update 4.2 source code with new gem, rebuild image
backport 4.2 changes to 4.1, rebuild 4.1 image
3.11 - update gem version in upstream Dockerfile - rebuild rubygem rpm downstream - rebuild downstream image

Comment 9 Rich Megginson 2019-07-23 15:59:49 UTC

The downstream rpm work has not been done for 3.11

Comment 10 Rich Megginson 2019-07-23 16:47:27 UTC

new rpm built and tagged for 3.11

brew tag-build rhaos-3.11-rhel-7-candidate rubygem-fluent-plugin-viaq_data_model-0.0.19-1.el7

Need 3.11 compose built and fluentd image rebuild

Comment 11 Rich Megginson 2019-07-23 17:17:27 UTC

sorry, wrong bz - this bz confused me because it is marked Version: 3.11.0 but Target Release: 4.2.0 - will clone for 3.11

Comment 12 Anping Li 2019-07-24 03:34:24 UTC

Unknow are used as level. tested using openshift/ose-logging-fluentd:201907222219 and openshift/ose-logging-rsyslo:201907222219

Comment 14 errata-xmlrpc 2019-10-16 06:33:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922