Bug 1514110
| Summary: | [3.5] Aggregated Logging replacing all log levels with '3' and '6' after upgrade to 3.5 from 3.4 | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | antti.niskanen | |
| Component: | Logging | Assignee: | Rich Megginson <rmeggins> | |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.5.0 | CC: | antti.niskanen, aos-bugs, dcaldwel, erich, george.goh, jcantril, nhosoi, rmeggins, smunilla, spaul, tkimura, trankin | |
| Target Milestone: | --- | |||
| Target Release: | 3.5.z | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | openshift3/logging-fluentd:3.5.0-51 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Fluentd was adding the level field with a value of 3 or 6, overwriting any existing level field.
Consequence: The level field set by the application was being removed, and the '3' or '6' value was not useful.
Fix: If there is already a `level` field in the record, then see if it is
a "close" match to one of the canonical `level` field values at
https://github.com/ViaQ/elasticsearch-templates/blob/master/namespaces/_default_.yml#L63
e.g. if `level` is "CRITICAL", convert to "crit", if level
is "WARN" convert to 'warning', etc.
Otherwise, if we cannot use it directly or normalize it, convert
it to its string representation (ruby `to_s` method) and store
the string value in the `level` field.
Result: If the record already has a level field, the value is normalized or preserved, otherwise, a value like `info` or `err` is used.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1515447 (view as bug list) | Environment: | ||
| Last Closed: | 2018-04-12 05:59:59 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1515447, 1515448 | |||
|
Description
antti.niskanen
2017-11-16 16:43:36 UTC
In 3.6 and later we are strictly enforcing the use of the `level` field as defined in our data model: https://github.com/ViaQ/elasticsearch-templates/blob/master/namespaces/_default_.yml#L63 "Logging level as provided by: rsyslog(severitytext property), python's logging module, etc. Possible values are as listed here: http://sourceware.org/git/?p=glibc.git;a=blob;f=misc/sys/syslog.h;h=ee01478c4b19a954426a96448577c5a76e6647c0;hb=HEAD#l74 plus `trace` and `unknown` That is: alert crit debug emerg err info notice trace unknown warning " systemd PRIORITY levels are mapped here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model_systemd.rb#L107 basically, map 1, 2, 3, ... to alert, crit, err, ... container json-file levels are mapped here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model.rb#L247 This just maps stream=stdout to level=info and stream=stderr to level=err It will overwrite any existing level field. The problem is that we have to strictly enforce the data model, which means we must force the `level` field to be one of the values listed above. If we don't strictly enforce the `level` field, then it just becomes a dumping ground for any value, and correlation of the `level` field across different applications becomes impossible. We could do this: If the level field is set, convert to lower case. If it matches one of the canonical values, use it. We can also convert "error" to "err", "panic" to "emerg", "warn" to "warning" - that is, do some simple normalization. If the value does not match, then we just use it as provided by the application, and the user is responsible for either doing some sort of post processing, or otherwise determining what is the meaning of the `level` field across different applications. Interesting. I also found this line in our fluentd configuration (/etc/fluent/configs.d/openshift/filter-k8s-record-transform.conf):
level ${record['stream'] == 'stdout' ? 6 : 3}
which seems to relate to the problem at hand.
At the moment we have no plans for the use of custom values in the 'level' field. Getting the usual container logging levels working would be nice, and the matching/conversion sound good. What is the proposed method and timeline for the fix?
(In reply to antti.niskanen from comment #2) > Interesting. I also found this line in our fluentd configuration > (/etc/fluent/configs.d/openshift/filter-k8s-record-transform.conf): > > level ${record['stream'] == 'stdout' ? 6 : 3} > > which seems to relate to the problem at hand. Yes. That is the problematic code in 3.5 which was replace with https://bugzilla.redhat.com/show_bug.cgi?id=1514110#c1 in 3.6 and later. > At the moment we have no plans for the use of custom values in the 'level' > field. Getting the usual container logging levels working would be nice, and > the matching/conversion sound good. What is the proposed method and timeline > for the fix? The only level data we get for container logs is the "stream" field - "stdout" means "info" and "stderr" means "err". There is no other log level information. Are you ok if we did something like this?: level ${record['stream'] == 'stdout' ? 'info' : 'err'} ? (In reply to Rich Megginson from comment #3) > The only level data we get for container logs is the "stream" field - > "stdout" means "info" and "stderr" means "err". There is no other log level > information. Now I kind of understand what you mean. For example, our users' applications write logs like this: {"log":"{\"@timestamp\":\"2017-11-08T21:43:16.802+00:00\",\"@version\":1,\"message\":\"Checking mq connection failed: java.util.concurrent.TimeoutException\",\"logger_name\":\"checkActiveMQ\",\"thread_name\":\"qtp1681838063-40\",\"level\":\"WARN\",\"level_value\":30000}\n","stream":"stdout","time":"2017-11-08T21:43:16.803202093Z"} Previously, the 'level' inside the message has been shown in Kibana (how...?), but now the 'stream' is the only defining factor. Using 'stream' as you suggested would still not work as the warnings/errors seem to be mostly stdout. Would it be possible to check for the message contents (JSON) for 'level' and use that as-is or convert/normalize if certain conditions are met? (In reply to antti.niskanen from comment #4) > (In reply to Rich Megginson from comment #3) > > The only level data we get for container logs is the "stream" field - > > "stdout" means "info" and "stderr" means "err". There is no other log level > > information. > > Now I kind of understand what you mean. For example, our users' applications > write logs like this: > > {"log":"{\"@timestamp\":\"2017-11-08T21:43:16.802+00:00\",\"@version\":1, > \"message\":\"Checking mq connection failed: > java.util.concurrent.TimeoutException\",\"logger_name\":\"checkActiveMQ\", > \"thread_name\":\"qtp1681838063-40\",\"level\":\"WARN\",\"level_value\": > 30000}\n","stream":"stdout","time":"2017-11-08T21:43:16.803202093Z"} > > Previously, the 'level' inside the message has been shown in Kibana > (how...?), Because the kubernetes metadata filter has a side effect - if the value of the "log" field looks like a JSON string, it will parse it and put the fields into the record: https://github.com/openshift/origin-aggregated-logging/blob/release-1.5/fluentd/configs.d/openshift/filter-k8s-meta.conf https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter#configuration merge_json_log - merge logs in JSON format as top level keys (default: true) preserve_json_log - preserve JSON logs in raw form in the log key, only used if the previous option is true (default: true) which means, in the above case, the fields @timestamp, @version, message, logger_name, thread_name, level and level_value are added to the record. > but now the 'stream' is the only defining factor. > > Using 'stream' as you suggested would still not work as the warnings/errors > seem to be mostly stdout. > > Would it be possible to check for the message contents (JSON) for 'level' > and use that as-is or convert/normalize if certain conditions are met? Yes, I think that is the correct approach: if level field exists if level.lower looks like one of our canonical error levels record.level = canonical error level else preserve existing value fi else record.level = info or err from stream fi In this case, that means level: INFO would be converted to level: info. (In reply to Rich Megginson from comment #5) > if level field exists > if level.lower looks like one of our canonical error levels > record.level = canonical error level > else > preserve existing value > fi > else > record.level = info or err from stream > fi > > In this case, that means level: INFO would be converted to level: info. This seems sensible and would keep our users happy. fixed in logging-fluentd-docker-3.5.0-47 https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=630202 Failed when use logging-fluentd/images/3.5.0-48
The log level is 6.
1) oc debug $kibana_pod
$echo '{"message":"anli_container_message_loglevel_warn", "level":"warn"}'
2) check the log level
"message" : "anli_container_message_loglevel_warn",
"docker" : {
"container_id" : "fb380bc05851283cee7901b6b7cf506064d844b08ea3c16a1d222917970ad84f"
},
"kubernetes" : {
"namespace_name" : "logging",
"pod_id" : "8c843f40-e579-11e7-86be-fa163e093800",
"pod_name" : "logging-kibana-1-dk4bs-debug",
"host" : "host-8-241-90.host.centralci.eng.rdu2.redhat.com",
"namespace_id" : "02d38e21-e4b9-11e7-9e06-fa163e093800",
"container_name" : "kibana"
},
"level" : "6",
koji_builds: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=636438 repositories: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.5-rhel-7-docker-candidate-35837-20180104032742 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.5.0 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:3.5.0-51 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.5 Verified and pass with openshift3/logging-fluentd:3.5.0-51
{
"_index" : "project.install-test.504d5cc0-f426-11e7-bd8f-fa163e48c543.2018.01.08",
"_type" : "com.redhat.viaq.common",
"_id" : "AWDUS-5g36jSyNyQsN1o",
"_score" : 4.052013,
"_source" : {
"message" : "anli_container_message_loglevel_warn",
"level" : "warning",
"docker" : {
"container_id" : "2c91eacb8765335d528c1d025c009d24dd496eb4ec13464d46a5add921d9fd3b"
},
"kubernetes" : {
"namespace_name" : "install-test",
"pod_id" : "ddeef473-f436-11e7-aa21-fa163e48c543",
"pod_name" : "nodejs-mongodb-example-1-3smvh-debug",
"host" : "host-8-241-43.host.centralci.eng.rdu2.redhat.com",
"namespace_id" : "504d5cc0-f426-11e7-bd8f-fa163e48c543",
"container_name" : "nodejs-mongodb-example"
},
Looking at our registry page at https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/logging-fluentd It looks like '3.5.0-48' still has the '3.5.0' and 'v3.5' tags. Any ETA on when '3.5.0-51' will be promoted to the '3.5.0'/'v3.5' tags? Hi guys, Do we have an ETA on this yet? Thanks, David. Release engineering will be able to advise the next release of the 3.5 code stream. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1106 |