Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1514110

Summary:	[3.5] Aggregated Logging replacing all log levels with '3' and '6' after upgrade to 3.5 from 3.4
Product:	OpenShift Container Platform	Reporter:	antti.niskanen
Component:	Logging	Assignee:	Rich Megginson <rmeggins>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.5.0	CC:	antti.niskanen, aos-bugs, dcaldwel, erich, george.goh, jcantril, nhosoi, rmeggins, smunilla, spaul, tkimura, trankin
Target Milestone:	---
Target Release:	3.5.z
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	openshift3/logging-fluentd:3.5.0-51	Doc Type:	Bug Fix
Doc Text:	Cause: Fluentd was adding the level field with a value of 3 or 6, overwriting any existing level field. Consequence: The level field set by the application was being removed, and the '3' or '6' value was not useful. Fix: If there is already a `level` field in the record, then see if it is a "close" match to one of the canonical `level` field values at https://github.com/ViaQ/elasticsearch-templates/blob/master/namespaces/_default_.yml#L63 e.g. if `level` is "CRITICAL", convert to "crit", if level is "WARN" convert to 'warning', etc. Otherwise, if we cannot use it directly or normalize it, convert it to its string representation (ruby `to_s` method) and store the string value in the `level` field. Result: If the record already has a level field, the value is normalized or preserved, otherwise, a value like `info` or `err` is used.	Story Points:	---
Clone Of:
Clones:	1515447 (view as bug list)		Environment:
Last Closed:	2018-04-12 05:59:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1515447, 1515448

Description antti.niskanen 2017-11-16 16:43:36 UTC

Description of problem:
After upgrading the cluster from 3.4 to 3.5, all message log levels (as seen in Kibana/Elasticsearch) are either '3' or '6'. This makes solving issues harder, as filtering on errors etc. is now not possible in Kibana.

Version-Release number of selected component (if applicable):
openshift v3.5.5.31.36
kubernetes v1.5.2+43a9be4
openshift3/logging-elasticsearch:v3.5
openshift3/logging-fluentd:v3.5
openshift3/logging-kibana:v3.5

How reproducible:
Unconfirmed

Steps to Reproduce:
Unconfirmed

Actual results:
Messages contain 3 or 6 (numerals) as log levels.

Expected results:
Messages should contain (at least) the usual INFO, WARN, ERROR levels.

Additional info:

Comment 1 Rich Megginson 2017-11-16 21:43:31 UTC

In 3.6 and later we are strictly enforcing the use of the `level` field as defined in our data model: https://github.com/ViaQ/elasticsearch-templates/blob/master/namespaces/_default_.yml#L63

"Logging level as provided by: rsyslog(severitytext property), python's
logging module, etc.
Possible values are as listed here: http://sourceware.org/git/?p=glibc.git;a=blob;f=misc/sys/syslog.h;h=ee01478c4b19a954426a96448577c5a76e6647c0;hb=HEAD#l74 plus `trace` and `unknown`
That is: alert crit debug emerg err info notice trace unknown warning
"

systemd PRIORITY levels are mapped here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model_systemd.rb#L107

basically, map 1, 2, 3, ... to alert, crit, err, ...

container json-file levels are mapped here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model.rb#L247

This just maps stream=stdout to level=info and stream=stderr to level=err

It will overwrite any existing level field.

The problem is that we have to strictly enforce the data model, which means we must force the `level` field to be one of the values listed above. If we don't strictly enforce the `level` field, then it just becomes a dumping ground for any value, and correlation of the `level` field across different applications becomes impossible.

We could do this: If the level field is set, convert to lower case. If it matches one of the canonical values, use it. We can also convert "error" to "err", "panic" to "emerg", "warn" to "warning" - that is, do some simple normalization. If the value does not match, then we just use it as provided by the application, and the user is responsible for either doing some sort of post processing, or otherwise determining what is the meaning of the `level` field across different applications.

Comment 2 antti.niskanen 2017-11-16 23:35:22 UTC

Interesting. I also found this line in our fluentd configuration (/etc/fluent/configs.d/openshift/filter-k8s-record-transform.conf):

    level ${record['stream'] == 'stdout' ? 6 : 3}

which seems to relate to the problem at hand.

At the moment we have no plans for the use of custom values in the 'level' field. Getting the usual container logging levels working would be nice, and the matching/conversion sound good. What is the proposed method and timeline for the fix?

Comment 3 Rich Megginson 2017-11-17 01:22:39 UTC

(In reply to antti.niskanen from comment #2)
> Interesting. I also found this line in our fluentd configuration
> (/etc/fluent/configs.d/openshift/filter-k8s-record-transform.conf):
> 
>     level ${record['stream'] == 'stdout' ? 6 : 3}
> 
> which seems to relate to the problem at hand.

Yes.  That is the problematic code in 3.5 which was replace with https://bugzilla.redhat.com/show_bug.cgi?id=1514110#c1 in 3.6 and later.
 
> At the moment we have no plans for the use of custom values in the 'level'
> field. Getting the usual container logging levels working would be nice, and
> the matching/conversion sound good. What is the proposed method and timeline
> for the fix?

The only level data we get for container logs is the "stream" field - "stdout" means "info" and "stderr" means "err".  There is no other log level information.  Are you ok if we did something like this?:

     level ${record['stream'] == 'stdout' ? 'info' : 'err'}

?

Comment 4 antti.niskanen 2017-11-17 10:30:55 UTC

(In reply to Rich Megginson from comment #3)
> The only level data we get for container logs is the "stream" field -
> "stdout" means "info" and "stderr" means "err".  There is no other log level
> information.

Now I kind of understand what you mean. For example, our users' applications write logs like this:

{"log":"{\"@timestamp\":\"2017-11-08T21:43:16.802+00:00\",\"@version\":1,\"message\":\"Checking mq connection failed: java.util.concurrent.TimeoutException\",\"logger_name\":\"checkActiveMQ\",\"thread_name\":\"qtp1681838063-40\",\"level\":\"WARN\",\"level_value\":30000}\n","stream":"stdout","time":"2017-11-08T21:43:16.803202093Z"}

Previously, the 'level' inside the message has been shown in Kibana (how...?), but now the 'stream' is the only defining factor.

Using 'stream' as you suggested would still not work as the warnings/errors seem to be mostly stdout.

Would it be possible to check for the message contents (JSON) for 'level' and use that as-is or convert/normalize if certain conditions are met?

Comment 5 Rich Megginson 2017-11-17 14:58:57 UTC

(In reply to antti.niskanen from comment #4)
> (In reply to Rich Megginson from comment #3)
> > The only level data we get for container logs is the "stream" field -
> > "stdout" means "info" and "stderr" means "err".  There is no other log level
> > information.
> 
> Now I kind of understand what you mean. For example, our users' applications
> write logs like this:
> 
> {"log":"{\"@timestamp\":\"2017-11-08T21:43:16.802+00:00\",\"@version\":1,
> \"message\":\"Checking mq connection failed:
> java.util.concurrent.TimeoutException\",\"logger_name\":\"checkActiveMQ\",
> \"thread_name\":\"qtp1681838063-40\",\"level\":\"WARN\",\"level_value\":
> 30000}\n","stream":"stdout","time":"2017-11-08T21:43:16.803202093Z"}
> 
> Previously, the 'level' inside the message has been shown in Kibana
> (how...?),

Because the kubernetes metadata filter has a side effect - if the value of the "log" field looks like a JSON string, it will parse it and put the fields into the record:

https://github.com/openshift/origin-aggregated-logging/blob/release-1.5/fluentd/configs.d/openshift/filter-k8s-meta.conf

https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter#configuration

merge_json_log - merge logs in JSON format as top level keys (default: true)

preserve_json_log - preserve JSON logs in raw form in the log key, only used if the previous option is true (default: true)

which means, in the above case, the fields @timestamp, @version, message, logger_name, thread_name, level and level_value are added to the record.

> but now the 'stream' is the only defining factor.
> 
> Using 'stream' as you suggested would still not work as the warnings/errors
> seem to be mostly stdout.
> 
> Would it be possible to check for the message contents (JSON) for 'level'
> and use that as-is or convert/normalize if certain conditions are met?

Yes, I think that is the correct approach:

if level field exists
  if level.lower looks like one of our canonical error levels
    record.level = canonical error level
  else
    preserve existing value
  fi
else
  record.level = info or err from stream
fi

In this case, that means level: INFO would be converted to level: info.

Comment 6 antti.niskanen 2017-11-20 22:50:16 UTC

(In reply to Rich Megginson from comment #5)
> if level field exists
>   if level.lower looks like one of our canonical error levels
>     record.level = canonical error level
>   else
>     preserve existing value
>   fi
> else
>   record.level = info or err from stream
> fi
> 
> In this case, that means level: INFO would be converted to level: info.

This seems sensible and would keep our users happy.

Comment 7 Rich Megginson 2017-12-11 23:06:55 UTC

fixed in logging-fluentd-docker-3.5.0-47
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=630202

Comment 8 Anping Li 2017-12-20 11:38:29 UTC

Failed when use logging-fluentd/images/3.5.0-48
The log level is 6.

1) oc debug $kibana_pod
$echo '{"message":"anli_container_message_loglevel_warn", "level":"warn"}'

2) check the log level

        "message" : "anli_container_message_loglevel_warn",
        "docker" : {
          "container_id" : "fb380bc05851283cee7901b6b7cf506064d844b08ea3c16a1d222917970ad84f"
        },
        "kubernetes" : {
          "namespace_name" : "logging",
          "pod_id" : "8c843f40-e579-11e7-86be-fa163e093800",
          "pod_name" : "logging-kibana-1-dk4bs-debug",
          "host" : "host-8-241-90.host.centralci.eng.rdu2.redhat.com",
          "namespace_id" : "02d38e21-e4b9-11e7-9e06-fa163e093800",
          "container_name" : "kibana"
        },
        "level" : "6",

Comment 9 Rich Megginson 2018-01-04 03:53:55 UTC

koji_builds:
  https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=636438
repositories:
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.5-rhel-7-docker-candidate-35837-20180104032742
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.5.0
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.5.0-51
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.5

Comment 10 Anping Li 2018-01-08 05:46:44 UTC

Verified and pass with  openshift3/logging-fluentd:3.5.0-51

{
      "_index" : "project.install-test.504d5cc0-f426-11e7-bd8f-fa163e48c543.2018.01.08",
      "_type" : "com.redhat.viaq.common",
      "_id" : "AWDUS-5g36jSyNyQsN1o",
      "_score" : 4.052013,
      "_source" : {
        "message" : "anli_container_message_loglevel_warn",
        "level" : "warning",
        "docker" : {
          "container_id" : "2c91eacb8765335d528c1d025c009d24dd496eb4ec13464d46a5add921d9fd3b"
        },
        "kubernetes" : {
          "namespace_name" : "install-test",
          "pod_id" : "ddeef473-f436-11e7-aa21-fa163e48c543",
          "pod_name" : "nodejs-mongodb-example-1-3smvh-debug",
          "host" : "host-8-241-43.host.centralci.eng.rdu2.redhat.com",
          "namespace_id" : "504d5cc0-f426-11e7-bd8f-fa163e48c543",
          "container_name" : "nodejs-mongodb-example"
        },

Comment 11 George Goh 2018-01-22 16:04:00 UTC

Looking at our registry page at 

https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/logging-fluentd

It looks like '3.5.0-48' still has the '3.5.0' and 'v3.5' tags.

Any ETA on when '3.5.0-51' will be promoted to the '3.5.0'/'v3.5' tags?

Comment 12 David Caldwell 2018-01-24 14:09:25 UTC

Hi guys,

Do we have an ETA on this yet?

Thanks,

David.

Comment 13 Jeff Cantrill 2018-01-24 14:49:51 UTC

Release engineering will be able to advise the next release of the 3.5 code stream.

Comment 28 errata-xmlrpc 2018-04-12 05:59:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1106