+++ This bug was initially created as a clone of Bug #1515447 +++ +++ This bug was initially created as a clone of Bug #1514110 +++ Description of problem: After upgrading the cluster from 3.4 to 3.5, all message log levels (as seen in Kibana/Elasticsearch) are either '3' or '6'. This makes solving issues harder, as filtering on errors etc. is now not possible in Kibana. Version-Release number of selected component (if applicable): openshift v3.5.5.31.36 kubernetes v1.5.2+43a9be4 openshift3/logging-elasticsearch:v3.5 openshift3/logging-fluentd:v3.5 openshift3/logging-kibana:v3.5 How reproducible: Unconfirmed Steps to Reproduce: Unconfirmed Actual results: Messages contain 3 or 6 (numerals) as log levels. Expected results: Messages should contain (at least) the usual INFO, WARN, ERROR levels. Additional info: --- Additional comment from Rich Megginson on 2017-11-16 16:43:31 EST --- In 3.6 and later we are strictly enforcing the use of the `level` field as defined in our data model: https://github.com/ViaQ/elasticsearch-templates/blob/master/namespaces/_default_.yml#L63 "Logging level as provided by: rsyslog(severitytext property), python's logging module, etc. Possible values are as listed here: http://sourceware.org/git/?p=glibc.git;a=blob;f=misc/sys/syslog.h;h=ee01478c4b19a954426a96448577c5a76e6647c0;hb=HEAD#l74 plus `trace` and `unknown` That is: alert crit debug emerg err info notice trace unknown warning " systemd PRIORITY levels are mapped here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model_systemd.rb#L107 basically, map 1, 2, 3, ... to alert, crit, err, ... container json-file levels are mapped here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model.rb#L247 This just maps stream=stdout to level=info and stream=stderr to level=err It will overwrite any existing level field. The problem is that we have to strictly enforce the data model, which means we must force the `level` field to be one of the values listed above. If we don't strictly enforce the `level` field, then it just becomes a dumping ground for any value, and correlation of the `level` field across different applications becomes impossible. We could do this: If the level field is set, convert to lower case. If it matches one of the canonical values, use it. We can also convert "error" to "err", "panic" to "emerg", "warn" to "warning" - that is, do some simple normalization. If the value does not match, then we just use it as provided by the application, and the user is responsible for either doing some sort of post processing, or otherwise determining what is the meaning of the `level` field across different applications. --- Additional comment from on 2017-11-16 18:35:22 EST --- Interesting. I also found this line in our fluentd configuration (/etc/fluent/configs.d/openshift/filter-k8s-record-transform.conf): level ${record['stream'] == 'stdout' ? 6 : 3} which seems to relate to the problem at hand. At the moment we have no plans for the use of custom values in the 'level' field. Getting the usual container logging levels working would be nice, and the matching/conversion sound good. What is the proposed method and timeline for the fix? --- Additional comment from Rich Megginson on 2017-11-16 20:22:39 EST --- (In reply to antti.niskanen from comment #2) > Interesting. I also found this line in our fluentd configuration > (/etc/fluent/configs.d/openshift/filter-k8s-record-transform.conf): > > level ${record['stream'] == 'stdout' ? 6 : 3} > > which seems to relate to the problem at hand. Yes. That is the problematic code in 3.5 which was replace with https://bugzilla.redhat.com/show_bug.cgi?id=1514110#c1 in 3.6 and later. > At the moment we have no plans for the use of custom values in the 'level' > field. Getting the usual container logging levels working would be nice, and > the matching/conversion sound good. What is the proposed method and timeline > for the fix? The only level data we get for container logs is the "stream" field - "stdout" means "info" and "stderr" means "err". There is no other log level information. Are you ok if we did something like this?: level ${record['stream'] == 'stdout' ? 'info' : 'err'} ? --- Additional comment from on 2017-11-17 05:30:55 EST --- (In reply to Rich Megginson from comment #3) > The only level data we get for container logs is the "stream" field - > "stdout" means "info" and "stderr" means "err". There is no other log level > information. Now I kind of understand what you mean. For example, our users' applications write logs like this: {"log":"{\"@timestamp\":\"2017-11-08T21:43:16.802+00:00\",\"@version\":1,\"message\":\"Checking mq connection failed: java.util.concurrent.TimeoutException\",\"logger_name\":\"checkActiveMQ\",\"thread_name\":\"qtp1681838063-40\",\"level\":\"WARN\",\"level_value\":30000}\n","stream":"stdout","time":"2017-11-08T21:43:16.803202093Z"} Previously, the 'level' inside the message has been shown in Kibana (how...?), but now the 'stream' is the only defining factor. Using 'stream' as you suggested would still not work as the warnings/errors seem to be mostly stdout. Would it be possible to check for the message contents (JSON) for 'level' and use that as-is or convert/normalize if certain conditions are met? --- Additional comment from Rich Megginson on 2017-11-17 09:58:57 EST --- (In reply to antti.niskanen from comment #4) > (In reply to Rich Megginson from comment #3) > > The only level data we get for container logs is the "stream" field - > > "stdout" means "info" and "stderr" means "err". There is no other log level > > information. > > Now I kind of understand what you mean. For example, our users' applications > write logs like this: > > {"log":"{\"@timestamp\":\"2017-11-08T21:43:16.802+00:00\",\"@version\":1, > \"message\":\"Checking mq connection failed: > java.util.concurrent.TimeoutException\",\"logger_name\":\"checkActiveMQ\", > \"thread_name\":\"qtp1681838063-40\",\"level\":\"WARN\",\"level_value\": > 30000}\n","stream":"stdout","time":"2017-11-08T21:43:16.803202093Z"} > > Previously, the 'level' inside the message has been shown in Kibana > (how...?), Because the kubernetes metadata filter has a side effect - if the value of the "log" field looks like a JSON string, it will parse it and put the fields into the record: https://github.com/openshift/origin-aggregated-logging/blob/release-1.5/fluentd/configs.d/openshift/filter-k8s-meta.conf https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter#configuration merge_json_log - merge logs in JSON format as top level keys (default: true) preserve_json_log - preserve JSON logs in raw form in the log key, only used if the previous option is true (default: true) which means, in the above case, the fields @timestamp, @version, message, logger_name, thread_name, level and level_value are added to the record. > but now the 'stream' is the only defining factor. > > Using 'stream' as you suggested would still not work as the warnings/errors > seem to be mostly stdout. > > Would it be possible to check for the message contents (JSON) for 'level' > and use that as-is or convert/normalize if certain conditions are met? Yes, I think that is the correct approach: if level field exists if level.lower looks like one of our canonical error levels record.level = canonical error level else preserve existing value fi else record.level = info or err from stream fi In this case, that means level: INFO would be converted to level: info.
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/ae9dcff602d92285a322674d18d4c42e990ad3e5 Bug 1515448 - [3.7] Aggregated Logging replacing all log levels with '3' and '6' after upgrade to 3.5 from 3.4 https://bugzilla.redhat.com/show_bug.cgi?id=1515448 Use fluent-plugin-viaq_data_model-0.0.12 https://github.com/openshift/origin-aggregated-logging/commit/c7235234d8fcc1eeedfff81725dbf4a15fb81b02 Merge pull request #791 from richm/bug-1515448 Automatic merge from submit-queue. Bug 1515448 - [3.7] Aggregated Logging replacing all log levels with '3' and '6' after upgrade to 3.5 from 3.4 https://bugzilla.redhat.com/show_bug.cgi?id=1515448 Use fluent-plugin-viaq_data_model-0.0.12 /test
koji_builds: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=631910 repositories: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.7-rhel-7-docker-candidate-10416-20171211201451 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.7.15 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.7.15-2 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.7
This is fixed in https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=631552 logging-fluentd-docker-v3.7.14-2 which is included in https://errata.devel.redhat.com/advisory/31700 RHBA-2017:31700-01 OpenShift Container Platform 3.7.z images update
Verified with openshift3/logging-fluentd/images/v3.7.14-5 When level is set in json array. it use its value instead of stream
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636