Description of problem: And in fluentd, there is no need to set CDM_USE_UNDEFINED=true when configuring CDM_UNDEFINED_TO_STRING, CDM_UNDEFINED_DOT_REPLACE_CHAR and CDM_UNDEFINED_MAX_NUM_FIELDS. In fluentd, if you set CDM_UNDEFINED_MAX_NUM_FIELDS, then the code assumes that you want to put the undefined fields in a JSON string blob in the field value "undefined" or whatever CDM_UNDEFINED_NAME is set to. It doesn't matter if CDM_USE_UNDEFINED is set.
Corrections: GOOD_1) In fluentd, if you set CDM_UNDEFINED_MAX_NUM_FIELDS, then the code assumes that you want to put the undefined fields in a JSON string blob in the field value "undefined" or whatever CDM_UNDEFINED_NAME is set to. It doesn't matter if CDM_USE_UNDEFINED is set. CDM ENV "CDM_USE_UNDEFINED": false, "CDM_UNDEFINED_MAX_NUM_FIELDS": 2, Output is correct. { "$!": { "openshift_logging_all": { "myname": "{\"Layer1\":\"layer1 0\",\"Number\":10,\"StringNumber\":\"10\",\"[foobar]\":\"Bracket Item\",\"foo bar\":\"Space Item\",\"foo.bar\":\"Dot Item\",\"foo:bar\":\"Colon Item\",\"layer2\":{\"name\":\"Layer2 1\",\"tips\":\"Decide by PRESERVE_JSON_LOG\"},\"level\":\"debug\",\"message\":\"MERGE_JSON_LOG=true\",\"{foobar}\":\"Brace Item\"}" } } } GOOD_2) CDM_UNDEFINED_DOT_REPLACE_CHAR CDM ENV: "CDM_USE_UNDEFINED": false, "CDM_UNDEFINED_DOT_REPLACE_CHAR": "_", Output is correct. { "$!": { "openshift_logging_all": { ... "foo_bar": "Dot Item", ... } } } GOOD_3) CDM_UNDEFINED_TO_STRING CDM EMV "CDM_USE_UNDEFINED": false, "CDM_UNDEFINED_TO_STRING": true, Output is correct. { "$!": { "openshift_logging_all": { ... "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"Decide by PRESERVE_JSON_LOG\"}", ... } } } BAD_1) CDM_UNDEFINED_TO_STRING with CDM_USE_UNDEFINED=true CDM EMV "CDM_USE_UNDEFINED": true, "CDM_UNDEFINED_TO_STRING": true, Actual outout: { "$!": { "openshift_logging_all": { ... "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"Decide by PRESERVE_JSON_LOG\"}", ... } } } Expected output: the same as in GOOD_1.
BAD_2) Regarding "In fluentd, if you set CDM_UNDEFINED_MAX_NUM_FIELDS, then the code assumes that you want to put the undefined fields in a JSON string blob in the field value "undefined" or whatever CDM_UNDEFINED_NAME is set to. It doesn't matter if CDM_USE_UNDEFINED is set", if CDM_USE_UNDEFINED=false, all the fields including the defined ones are moved to "undefined" field as json string - [1]. It should be the same as the result when CDM_USE_UNDEFINED=true - [2]. [1] "CDM_USE_UNDEFINED": false, "CDM_UNDEFINED_MAX_NUM_FIELDS": 2, $ echo '{"$!": {"message": "MERGE_JSON_LOG=true", "level": "debug","Layer1": "layer1 0", "layer2": {"name":"Layer2 1", "tips":"Decide by PRESERVE_JSON_LOG"}, "StringNumber":"10", "Number": 10,"foo.bar":"Dot Item","{foobar}":"Brace Item","[foobar]":"Bracket Item", "foo:bar":"Colon Item","foo bar":"Space Item" }}' | go run undefined_field.go | jq { "$!": { "openshift_logging_all": { "undefined": "{\"Layer1\":\"layer1 0\",\"Number\":10,\"StringNumber\":\"10\",\"[foobar]\":\"Bracket Item\",\"foo bar\":\"Space Item\",\"foo.bar\":\"Dot Item\",\"foo:bar\":\"Colon Item\",\"layer2\":{\"name\":\"Layer2 1\",\"tips\":\"Decide by PRESERVE_JSON_LOG\"},\"level\":\"debug\",\"message\":\"MERGE_JSON_LOG=true\",\"{foobar}\":\"Brace Item\"}" } } } [2] "CDM_USE_UNDEFINED": true, "CDM_UNDEFINED_MAX_NUM_FIELDS": 2, undefined_field 1083]$ echo '{"$!": {"message": "MERGE_JSON_LOG=true", "level": "debug","Layer1": "layer1 0", "layer2": {"name":"Layer2 1", "tips":"Decide by PRESERVE_JSON_LOG"}, "StringNumber":"10", "Number": 10,"foo.bar":"Dot Item","{foobar}":"Brace Item","[foobar]":"Bracket Item", "foo:bar":"Colon Item","foo bar":"Space Item" }}' | go run undefined_field.go | jq { "$!": { "openshift_logging_all": { "level": "debug", "message": "MERGE_JSON_LOG=true", "undefined": "{\"Layer1\":\"layer1 0\",\"Number\":10,\"StringNumber\":\"10\",\"[foobar]\":\"Bracket Item\",\"foo bar\":\"Space Item\",\"foo.bar\":\"Dot Item\",\"foo:bar\":\"Colon Item\",\"layer2\":{\"name\":\"Layer2 1\",\"tips\":\"Decide by PRESERVE_JSON_LOG\"},\"{foobar}\":\"Brace Item\"}" }
(In reply to Noriko Hosoi from comment #1) > Corrections: > GOOD_1) In fluentd, if you set CDM_UNDEFINED_MAX_NUM_FIELDS, then the code > assumes that you want to put the undefined fields in a JSON string blob in > the field value "undefined" or whatever CDM_UNDEFINED_NAME is set to. It > doesn't matter if CDM_USE_UNDEFINED is set. > > CDM ENV > "CDM_USE_UNDEFINED": false, > "CDM_UNDEFINED_MAX_NUM_FIELDS": 2, CDM_UNDEFINED_NAME=myname > > Output is correct. > { > "$!": { > "openshift_logging_all": { > "myname": "{\"Layer1\":\"layer1 > 0\",\"Number\":10,\"StringNumber\":\"10\",\"[foobar]\":\"Bracket > Item\",\"foo bar\":\"Space Item\",\"foo.bar\":\"Dot > Item\",\"foo:bar\":\"Colon Item\",\"layer2\":{\"name\":\"Layer2 > 1\",\"tips\":\"Decide by > PRESERVE_JSON_LOG\"},\"level\":\"debug\",\"message\":\"MERGE_JSON_LOG=true\", > \"{foobar}\":\"Brace Item\"}" > } > } > } > > GOOD_2) CDM_UNDEFINED_DOT_REPLACE_CHAR > > CDM ENV: > "CDM_USE_UNDEFINED": false, > "CDM_UNDEFINED_DOT_REPLACE_CHAR": "_", > > Output is correct. > { > "$!": { > "openshift_logging_all": { > ... > "foo_bar": "Dot Item", > ... > } > } > } > > GOOD_3) CDM_UNDEFINED_TO_STRING > > CDM EMV > "CDM_USE_UNDEFINED": false, > "CDM_UNDEFINED_TO_STRING": true, > > Output is correct. > { > "$!": { > "openshift_logging_all": { > ... > "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"Decide by > PRESERVE_JSON_LOG\"}", > ... > } > } > } > > BAD_1) CDM_UNDEFINED_TO_STRING with CDM_USE_UNDEFINED=true > > CDM EMV > "CDM_USE_UNDEFINED": true, > "CDM_UNDEFINED_TO_STRING": true, > > Actual outout: > { > "$!": { > "openshift_logging_all": { > ... > "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"Decide by > PRESERVE_JSON_LOG\"}", > ... > } > } > } > > Expected output: the same as in GOOD_1. No, the expected output is this: { "$!": { "openshift_logging_all": { ... "undefined": { "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"Decide by PRESERVE_JSON_LOG\"}", ... } } } } Since CDM_USE_UNDEFINED=true, the undefined fields will be placed under the top level field named "undefined", and each field will have its value converted to its string representation.
(In reply to Rich Megginson from comment #3) > No, the expected output is this: > > { > "$!": { > "openshift_logging_all": { > ... > "undefined": { > "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"Decide by > PRESERVE_JSON_LOG\"}", > ... > } > } > } > } > > Since CDM_USE_UNDEFINED=true, the undefined fields will be placed under the > top level field named "undefined", and each field will have its value > converted to its string representation. Ok. Got it. Thanks, Rich!
Bug verification blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1737286
Created attachment 1601214 [details] ES pod logs Tested with ose-logging-rsyslog-v4.2.0-201908041300, when only set MERGE_JSON_LOG=true and CDM_UNDEFINED_TO_STRING=true(CDM_USE_UNDEFINED is false by default), the logs could not be collected, I checked the indices in ES, the docs.count is not increasing, then I deleted the indices project.* and .operations.*, the docs.count remain to 0 $ oc exec elasticsearch-cdm-2dafwda5-2-848fb5d899-mbwq4 -- indices Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-2dafwda5-2-848fb5d899-mbwq4 -n openshift-logging' to see all of the containers in this pod. Wed Aug 7 01:57:47 UTC 2019 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open project.json1.bc4093af-b8b2-11e9-a142-06f6f3925446.2019.08.07 n3MWdxbPQRuwZvMJkL-GZQ 2 1 799 0 1 0 green open .kibana fThOg1aDTOG8NT29OBzYdQ 1 1 5 0 0 0 green open .searchguard YYJ6X3hBR--LKUAMqWsUOg 1 1 5 0 0 0 green open project.test1.fd393550-b8b0-11e9-a142-06f6f3925446.2019.08.07 spztHNiJR3W--fyf1wctxQ 2 1 798 0 1 0 green open .kibana.af6615b8667e56f05cfd9d4bd371847d9ece3226 5aHKHKXyQgWs24EycPMCLA 1 1 5 2 0 0 green open .operations.2019.08.07 -fFBej2ETc6WEh_RclHBEA 2 1 75816 0 192 95 green open .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac aMfJXnzyQeyhJ037LGC4SA 1 1 3 0 0 0 $ oc exec elasticsearch-cdm-2dafwda5-2-848fb5d899-mbwq4 -- indices Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-2dafwda5-2-848fb5d899-mbwq4 -n openshift-logging' to see all of the containers in this pod. Wed Aug 7 02:02:06 UTC 2019 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open project.json1.bc4093af-b8b2-11e9-a142-06f6f3925446.2019.08.07 n3MWdxbPQRuwZvMJkL-GZQ 2 1 799 0 1 0 green open .kibana fThOg1aDTOG8NT29OBzYdQ 1 1 5 0 0 0 green open .searchguard YYJ6X3hBR--LKUAMqWsUOg 1 1 5 0 0 0 green open project.test1.fd393550-b8b0-11e9-a142-06f6f3925446.2019.08.07 spztHNiJR3W--fyf1wctxQ 2 1 798 0 1 0 green open .kibana.af6615b8667e56f05cfd9d4bd371847d9ece3226 5aHKHKXyQgWs24EycPMCLA 1 1 5 2 0 0 green open .operations.2019.08.07 -fFBej2ETc6WEh_RclHBEA 2 1 75817 0 192 96 green open .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac aMfJXnzyQeyhJ037LGC4SA 1 1 3 0 0 0 Also tried to create new project to generate logs, the new index could be created, but no logs collected. $ oc exec elasticsearch-cdm-2dafwda5-1-75cdfcf6d-8hm58 -- indices Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-2dafwda5-1-75cdfcf6d-8hm58 -n openshift-logging' to see all of the containers in this pod. Wed Aug 7 02:18:12 UTC 2019 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .kibana fThOg1aDTOG8NT29OBzYdQ 1 1 5 0 0 0 green open .searchguard YYJ6X3hBR--LKUAMqWsUOg 1 1 5 0 0 0 green open project.test1.fd393550-b8b0-11e9-a142-06f6f3925446.2019.08.07 fWm4UkdwSQKEEkzHnu0XWQ 2 1 0 0 0 0 green open .operations.2019.08.07 DGxhL5zkS-6VdfsbWOdBZA 2 1 0 0 0 0 green open project.test2.ab4582a4-b8b7-11e9-a142-06f6f3925446.2019.08.07 yV6Yx-XSThOJDOzuEByecw 2 1 0 0 0 0 green open project.json1.bc4093af-b8b2-11e9-a142-06f6f3925446.2019.08.07 1IskDfb_QNyYdl-AiNMAlA 2 1 0 0 0 0 green open .kibana.af6615b8667e56f05cfd9d4bd371847d9ece3226 5aHKHKXyQgWs24EycPMCLA 1 1 5 5 0 0 green open .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac aMfJXnzyQeyhJ037LGC4SA 1 1 2 1 0 0 The ES pod logs are in the Attachment Other parameters work as expected when CDM_USE_UNDEFINED not set. Besides, if set CDM_USE_UNDEFINED=true MERGE_JSON_LOG=true and CDM_UNDEFINED_TO_STRING=true, then the logs could be collected and parsed as expected.
In your test data, do you have a field called "pipeline.metadata"? [2019-08-07T02:25:22,093][DEBUG][o.e.a.b.TransportShardBulkAction] [elasticsearch-cdm-2dafwda5-2] [.operations.2019.08.07][0] failed to execute bulk item (create) BulkShardRequest [[.operations.2019.08.07][0]] containing [index {[.operations.2019.08.07][com.redhat.viaq.common][DD4B1DEA81694A68AB515EAAC39BFF6D], source[n/a, actual length: [2.2kb], max length: 2kb]}] org.elasticsearch.index.mapper.MapperParsingException: object mapping for [pipeline_metadata] tried to parse field [pipeline_metadata] as object, but found a concrete value
> In your test data, do you have a field called "pipeline.metadata"? or "pipeline_metadata" - but I'm assuming what is happening is that you have a top level field in your test data like {"pipeline_metadata":numbervalue} and you want to see if it gets turned into {"pipeline_metadata":"numbervalue"} in Elasticsearch. The problem is that "pipeline_metadata" is one of the reserved field names. This includes "pipeline.metadata" if you are using MERGE_JSON_LOG=true and CDM_UNDEFINED_DOT_REPLACE_CHAR=_. I don't know if we have this documented, but any top level field name in this list https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/filter-viaq-data-model.conf#L3 CEE,time,@timestamp,aushape,ci_job,collectd,docker,fedora-ci,file,foreman,geoip,hostname,ipaddr4,ipaddr6,kubernetes,level,message,namespace_name,namespace_uuid,offset,openstack,ovirt,pid,pipeline_metadata,rsyslog,service,systemd,tags,testcase,tlog,viaq_msg_id or here: https://github.com/openshift/cluster-logging-operator/blob/master/files/rsyslog/rsyslog.sh#L53 You must be very careful if you use these fields - you must be able to guarantee that you use the exact same format as used by fluentd/rsyslog. For some of these fields, it's pretty easy and common for user applications to set the values - "level":"err", or setting @timestamp to an rfc3339 formatted string. If the application cannot guarantee not to conflict with the reserved list of top level fields, the user must use CDM_USE_UNDEFINED=true in order to first "move" the field into the "undefined" field {"pipeline_metadata":numbervalue} -> {"undefined":{"pipeline.metadata":numbervalue}} then the undefined_field.go will do the CDM_UNDEFINED_TO_STRING=true processing and do this {"undefined":{"pipeline.metadata":numbervalue}} -> {"undefined":{"pipeline_metadata":"numbervalue"}}
There is a field called pipeline_metadata, but it's added by the collector, not in my app's logs. I tried not to create any project in the cluster, just collect the logs in namespaces openshift-*, kube-* and default, still have this issue. The pipeline_metadata is like: "pipeline_metadata": { "collector": { "original_raw_message": "I0808 02:44:17.005096 1 controller.go:102] OpenAPI AggregationController: Processing item k8s_internal_local_delegation_chain_0000000002", "name": "rsyslog", "inputname": "imfile", "received_at": "2019-08-08T02:44:17.005224+00:00", "ipaddr4": "10.0.151.255", "version": "8.37.0-9.el7 " } }, And I also find an error: [2019-08-08T02:33:50,136][DEBUG][o.e.a.b.TransportShardBulkAction] [elasticsearch-cdm-284r1ole-2] [.operations.2019.08.08][1] failed to execute bulk item (create) BulkShardRequest [[.operations.2019.08.08][1]] containing [13] requests org.elasticsearch.index.mapper.MapperParsingException: object mapping for [docker] tried to parse field [docker] as object, but found a concrete value The docker field is also added by the collector, here is an example of docker field: "docker": { "container_id": "25093456c0a5cdfcb63f89b62452faa0484778027497c05d598c953ff03f8a07" },
hmm - then maybe the bug is that when you enable CDM_UNDEFINED_TO_STRING=true, undefined_field.go somehow forgets that "docker" and "pipeline_metadata" are not undefined fields, converts them to string, and boom - error.
(In reply to Rich Megginson from comment #11) > hmm - then maybe the bug is that when you enable > CDM_UNDEFINED_TO_STRING=true, undefined_field.go somehow forgets that > "docker" and "pipeline_metadata" are not undefined fields, converts them to > string, and boom - error. Can I have 2 sets of CDM env vars? I'd like to reproduce the problem. Was the test scenario like this? Using the first set, I'd think the test had passed. Then, CDM_UNDEFINED_TO_STRING=true was set and the test failed, didn't it? The test data are not artificially generated ones, but from rsyslog pods. Thanks!
It's easy to reproduce. I think the problem is that undefined_field.go assumes that you build the map of defined fields, and check for undefined fields, _only_ if USE_UNDEFINED=true or max num field checking is enabled. This is not the case. We have to build the map of defined fields, and check for undefined fields, also for the UNDEFINED_TO_STRING and UNDEFINED_DOT_REPLACE_CHAR cases.
(In reply to Rich Megginson from comment #13) > It's easy to reproduce. I think the problem is that undefined_field.go > assumes that you build the map of defined fields, and check for undefined > fields, _only_ if USE_UNDEFINED=true or max num field checking is enabled. > This is not the case. We have to build the map of defined fields, and check > for undefined fields, also for the UNDEFINED_TO_STRING and > UNDEFINED_DOT_REPLACE_CHAR cases. Hmmm, we are calling processUndefinedToString when UndefMap is empty, as well... https://github.com/openshift/origin-aggregated-logging/blob/master/rsyslog/undefined_field/undefined_field.go#L453-L457 Then, eventually we check field "pipeline_metadata" with keepFields map and since it's not there, the value is converted to string. I simplified the test case. The result looks good to me... (my undefined config - [1]) echo '{"$!": {"message": "MERGE_JSON_LOG=true", "pipeline_metadata" : { "collector": {"name": "rsyslog", "inputname":"imfile", "ipaddr4": "10.151.0.255", "version": "8.37.0-9.el7"}} }}' | go run undefined_field.go | jq { "$!": { "openshift_logging_all": { "message": "MERGE_JSON_LOG=true", "pipeline_metadata": "{\"collector\":{\"inputname\":\"imfile\",\"ipaddr4\":\"10.151.0.255\",\"name\":\"rsyslog\",\"version\":\"8.37.0-9.el7\"}}" } } } [1] {"CDM_USE_UNDEFINED" : false, "CDM_DEFAULT_KEEP_FIELDS" : "CEE,time,@timestamp,aushape,ci_job,collectd,docker,fedora-ci,file,foreman,geoip,hostname,ipaddr4,ipaddr6,kubernetes,level,message,namespace_name,namespace_uuid,offset,openstack,ovirt,pid,pipeline_metadata,rsyslog,service,systemd,tags,testcase,tlog,viaq_msg_id", "CDM_EXTRA_KEEP_FIELDS" : "", "CDM_UNDEFINED_NAME" : "undefined", "CDM_KEEP_EMPTY_FIELDS" : "", "CDM_UNDEFINED_TO_STRING" : true, "CDM_UNDEFINED_DOT_REPLACE_CHAR" : "UNUSED", "CDM_UNDEFINED_MAX_NUM_FIELDS" : -1, "MERGE_JSON_LOG" : false, "UNDEFINED_DEBUG" : true}
(In reply to Rich Megginson from comment #13) > It's easy to reproduce. I think the problem is that undefined_field.go > assumes that you build the map of defined fields, and check for undefined > fields, _only_ if USE_UNDEFINED=true or max num field checking is enabled. > This is not the case. We have to build the map of defined fields, and check > for undefined fields, also for the UNDEFINED_TO_STRING and > UNDEFINED_DOT_REPLACE_CHAR cases. I'm having an issue to run the series of test cases one after another. Could you help me, @Qiaoling? First my rsyslog status is out of the box. Only a CDM env "CDM_KEEP_EMPTY_FIELDS=message". It makes the pipeline_metadata value in the json format. Then I set "oc set env ds rsyslog CDM_UNDEFINED_TO_STRING=true", which turns the pipeline_metadata value to the json string. The log record will not be accepted by elasticsearch. How can I reset the elasticsearch? Can I do that by deleting .operations.2019.08.08? Actually, I did it, but still I cannot switch to the "CDM_UNDEFINED_TO_STRING=true"... Please note that undefined_field unit level, it looks to me the config change is successfully done... Thanks for your help in advance.
(In reply to Noriko Hosoi from comment #15) > (In reply to Rich Megginson from comment #13) > > It's easy to reproduce. I think the problem is that undefined_field.go > > assumes that you build the map of defined fields, and check for undefined > > fields, _only_ if USE_UNDEFINED=true or max num field checking is enabled. > > This is not the case. We have to build the map of defined fields, and check > > for undefined fields, also for the UNDEFINED_TO_STRING and > > UNDEFINED_DOT_REPLACE_CHAR cases. > > I'm having an issue to run the series of test cases one after another. > Could you help me, @Qiaoling? > First my rsyslog status is out of the box. Only a CDM env > "CDM_KEEP_EMPTY_FIELDS=message". It makes the pipeline_metadata value in > the json format. > Then I set "oc set env ds rsyslog CDM_UNDEFINED_TO_STRING=true", which turns > the pipeline_metadata value to the json string. The log record will not be > accepted by elasticsearch. How can I reset the elasticsearch? Can I do > that by deleting .operations.2019.08.08? Actually, I did it, but still I > cannot switch to the "CDM_UNDEFINED_TO_STRING=true"... Deleting index in ES doesn't make sense, the logs will be rejected by ES after deleting indices in ES. I'm not quite understand what does `reset elasticsearch` mean. The issue here is when set CDM_UNDEFINED_TO_STRING=true and the CDM_USE_UNDEFINED is not set, the logs will be rejected by the ES. If you want to make the logging stack work, unset CDM_UNDEFINED_TO_STRING or set CDM_USE_UNDEFINED=true. > Please note that undefined_field unit level, it looks to me the config > change is successfully done... > > Thanks for your help in advance.
This is the problem with CDM_UNDEFINED_TO_STRING=true in general - if you need it, you can't really set it without getting a lot of errors. Customers will probably have to set it at 11:59pm UTC so that they only have 1 minute of syntax errors . . . I suppose you could do this for testing purposes. oc set env ds/rsyslog CDM_UNDEFINED_TO_STRING=true then delete the index in Elasticsearch
(In reply to Rich Megginson from comment #17) > This is the problem with CDM_UNDEFINED_TO_STRING=true in general - if you > need it, you can't really set it without getting a lot of errors. Customers > will probably have to set it at 11:59pm UTC so that they only have 1 minute > of syntax errors . . . > > I suppose you could do this for testing purposes. > > oc set env ds/rsyslog CDM_UNDEFINED_TO_STRING=true > > then delete the index in Elasticsearch Hmmm, "delete the index in Elasticsearch" means delete it like this? If so, I still cannot make my test run successfully... :( oc exec $ESPOD -- curl [....] "https://localhost:9200/.operations.2019.08.08" -XDELETE I know if CDM_UNDEFINED_TO_STRING=true is set with CDM_USE_UNDEFINED=true, then we could control the top level "undefined" field name (by default "undefined") to store the various value types belonging to the same undefined field such as pipeline_metadata... (If ok, I'd love not to allow setting CDM_UNDEFINED_TO_STRING=true without CDM_USE_UNDEFINED=true since the troubleshooting is this hard... I'm scared if a customer do it just for fun, they could lose the entire index...)
(In reply to Noriko Hosoi from comment #18) > (In reply to Rich Megginson from comment #17) > > This is the problem with CDM_UNDEFINED_TO_STRING=true in general - if you > > need it, you can't really set it without getting a lot of errors. Customers > > will probably have to set it at 11:59pm UTC so that they only have 1 minute > > of syntax errors . . . > > > > I suppose you could do this for testing purposes. > > > > oc set env ds/rsyslog CDM_UNDEFINED_TO_STRING=true > > > > then delete the index in Elasticsearch > > Hmmm, "delete the index in Elasticsearch" means delete it like this? If so, > I still cannot make my test run successfully... :( > > oc exec $ESPOD -- curl [....] > "https://localhost:9200/.operations.2019.08.08" -XDELETE oc set env ds/rsyslog CDM_UNDEFINED_TO_STRING=true # wait for pods to restart oc exec -c elasticsearch $ESPOD -- es_util --query=.operations.2019.08.08 -XDELETE It writes the old, non-string data to your undefined fields? If so, then I guess it is buffered records. We get around this in the CI environment because we use a nodeSelector which allows us to completely stop rsyslog, so we can do something like this: SYSLOG_JOURNAL_READ_FROM_TAIL=on RSYSLOG_FILE_READ_FROM_TAIL=on RSYSLOG_USE_IMPSTATS_FILE=true MERGE_JSON_LOG=true stop_rsyslog $rpod # no rsyslog pods are running oc set env $rsyslog_ds RSYSLOG_JOURNAL_READ_FROM_TAIL=on RSYSLOG_FILE_READ_FROM_TAIL=on RSYSLOG_USE_IMPSTATS_FILE=true start_rsyslog cleanfirst Using cleanfirst means to wipe out the entire contents of /var/log/rsyslog/* and /var/lib/rsyslog.pod/* before starting up rsyslog. Using RSYSLOG_JOURNAL_READ_FROM_TAIL=on RSYSLOG_FILE_READ_FROM_TAIL=on means we don't have to wait for rsyslog to read the entire journald and all of the container logs from the beginning - it will just skip to the end. I suppose you could do something similar: stop_rsyslog $rpod # no rsyslog pods are running oc set env $rsyslog_ds RSYSLOG_JOURNAL_READ_FROM_TAIL=on RSYSLOG_FILE_READ_FROM_TAIL=on RSYSLOG_USE_IMPSTATS_FILE=true start_rsyslog cleanfirst ... after some time ... stop_rsyslog $rpod # no rsyslog pods are running oc set env $rsyslog_ds CDM_UNDEFINED_TO_STRING=true oc exec -c elasticsearch $ESPOD -- es_util --query=.operations.* -XDELETE start_rsyslog cleanfirst If _that_ doesn't work, then I'm not sure what's going on . . . > > I know if CDM_UNDEFINED_TO_STRING=true is set with CDM_USE_UNDEFINED=true, > then we could control the top level "undefined" field name (by default > "undefined") to store the various value types belonging to the same > undefined field such as pipeline_metadata... (If ok, I'd love not to allow > setting CDM_UNDEFINED_TO_STRING=true without CDM_USE_UNDEFINED=true since > the troubleshooting is this hard... I'm scared if a customer do it just for > fun, they could lose the entire index...)
@nhosoi - I think https://github.com/openshift/origin-aggregated-logging/pull/1721 fixes the problem with CDM_UNDEFINED_TO_STRING=true, that it wasn't working unless you set CDM_USE_UNDEFINED=true. The fix in the PR allows you to use CDM_UNDEFINED_TO_STRING=true (and CDM_UNDEFINED_REPLACE_DOT_CHAR) without having to set CDM_USE_UNDEFINED=true. So I think this BZ should be moved to POST. As for the other issue, that if you set CDM_UNDEFINED_TO_STRING=true fluentd/rsyslog stop working, that is a separate bz https://bugzilla.redhat.com/show_bug.cgi?id=1739462
Tested with ose-logging-rsyslog-v4.2.0-201908091819, logs could be parsed when CDM_UNDEFINED_TO_STRING=true and CDM_USE_UNDEFINED was not set. " Layer1": "layer1 0", "@timestamp": "2019-08-12T01:37:28.691837830+00:00", "Number": "10", "StringNumber": "10", "[foobar]": "bracket string", "docker": { "container_id": "f7994be28f370342064954db93b488c668df5708f9fcc500b56bdec308c8f40a" }, "empty2": "{}", "file": "/var/log/containers/log4j2-7bf9967879-lgbqn_test1_log4j2-f7994be28f370342064954db93b488c668df5708f9fcc500b56bdec308c8f40a.log", "foo.bar": "dotstring", "foo:bar": "colonstring", "hostname": "ip-10-0-156-41.eu-west-2.compute.internal", "kubernetes": { "container_name": "log4j2", "creation_timestamp": "2019-08-12T01:29:23Z", "host": "ip-10-0-156-41.eu-west-2.compute.internal", "labels": { "name": "log4j2", "pod-template-hash": "7bf9967879" }, "master_url": "https://kubernetes.default.svc.cluster.local:443", "namespace_id": "9d5f58e3-bca0-11e9-bbe7-02d99903896a", "namespace_name": "test1", "pod_id": "a9671a01-bca0-11e9-a216-0a659dde22fa", "pod_name": "log4j2-7bf9967879-lgbqn" }, "layer2": "{\"name\":\"Layer2 1\",\"tips\":\"decide by PRESERVE_JSON_LOG\"}", "level": "debug", "message": "MERGE_JSON_LOG=true", "offset": "85428", "pipeline_metadata": { "collector": { "inputname": "imfile", "ipaddr4": "10.0.156.41", "name": "rsyslog", "original_raw_message": "{\"message\": \"MERGE_JSON_LOG=true\", \"testcase\": \"logging-test\", \"level\": \"debug\",\" Layer1\": \"layer1 0\", \"layer2\": {\"name\":\"Layer2 1\", \"tips\":\"decide by PRESERVE_JSON_LOG\"}, \"StringNumber\":\"10\", \"Number\": 10,\"foo.bar\":\"dotstring\",\"{foobar}\":\"bracestring\",\"[foobar]\":\"bracket string\", \"foo:bar\":\"colonstring\", \"empty1\":\"\", \"empty2\":{}}", "received_at": "2019-08-12T01:37:28.691925+00:00", "version": "8.37.0-9.el7 " } }, "testcase": "logging-test", "{foobar}": "bracestring"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922