+++ This bug was initially created as a clone of Bug #1756920 +++ 1. Bug Overview: a) Description of bug report: [RHOCP4.1] fluentd pods doesn't process kubernetes events b) Bug Description: In OCP3.11, fluentd analyzes and transforms kubernetes events before sending them to ElasticSearch. On the other hand, in OCP4.11, fluentd doesn't analyze and sends kubernetes events to ElasticSearch without transforming them. OCP3.11: $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{ "query": { "bool" : { "must" : [ { "range" : { "@timestamp" : { "gte": "now-1d", "lt": "now", "time_zone": "+09:00" } } }, { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }} ] } } }' | python -m json.tool ... { ... "container_name": "kube-eventrouter", "event": { "count": 1, "firstTimestamp": "2019-09-29T18:30:12Z", "involvedObject": { "apiVersion": "batch/v1beta1", "kind": "CronJob", "name": "logging-curator", "namespace": "openshift-logging", "resourceVersion": "794794", "uid": "631314eb-e03e-11e9-abf0-fa163e6ee2bd" }, "lastTimestamp": "2019-09-29T18:30:12Z", "metadata": { "name": "logging-curator.15c8fc3dbd4b78b2", "namespace": "openshift-logging", "resourceVersion": "794799", "selfLink": "/api/v1/namespaces/openshift-logging/events/logging-curator.15c8fc3dbd4b78b2", "uid": "2c70ebce-e2e7-11e9-8fe6-fa163e694aa3" }, "reason": "SuccessfulDelete", "source": { "component": "cronjob-controller" }, "type": "Normal", "verb": "ADDED" }, "host": "hatosp311-node2", "labels": { "component": "eventrouter", "deployment": "logging-eventrouter-1", "deploymentconfig": "logging-eventrouter", "logging-infra": "eventrouter", "provider": "openshift" }, "master_url": "https://kubernetes.default.svc.cluster.local", "namespace_id": "51b29781-e03b-11e9-abf0-fa163e6ee2bd", "namespace_name": "default", "pod_id": "9177bc57-e03e-11e9-abf0-fa163e6ee2bd", "pod_name": "logging-eventrouter-1-jnjwn" }, "level": "info", "message": "Deleted job logging-curator-1569695400", "pipeline_metadata": { "collector": { "inputname": "fluent-plugin-systemd", "ipaddr4": "10.131.0.58", "ipaddr6": "fe80::10aa:c4ff:febf:4e6f", "name": "fluentd", "original_raw_message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"logging-curator.15c8fc3dbd4b78b2\",\"namespace\":\"openshift-logging\",\"selfLink\":\"/api/v1/namespaces/openshift-logging/events/logging-curator.15c8fc3dbd4b78b2\",\"uid\":\"2c70ebce-e2e7-11e9-8fe6-fa163e694aa3\",\"resourceVersion\":\"794799\",\"creationTimestamp\":\"2019-09-29T18:30:12Z\"},\"involvedObject\":{\"kind\":\"CronJob\",\"namespace\":\"openshift-logging\",\"name\":\"logging-curator\",\"uid\":\"631314eb-e03e-11e9-abf0-fa163e6ee2bd\",\"apiVersion\":\"batch/v1beta1\",\"resourceVersion\":\"794794\"},\"reason\":\"SuccessfulDelete\",\"message\":\"Deleted job logging-curator-1569695400\",\"source\":{\"component\":\"cronjob-controller\"},\"firstTimestamp\":\"2019-09-29T18:30:12Z\",\"lastTimestamp\":\"2019-09-29T18:30:12Z\",\"count\":1,\"type\":\"Normal\"}}\n", "received_at": "2019-09-29T18:30:13.842179+00:00", "version": "0.12.43 1.6.0" ... OCP4.1: $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{ "query": { "bool" : { "must" : [ { "range" : { "@timestamp" : { "gte": "now-1h", "lt": "now", "time_zone": "+09:00" } } }, { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }} ] } } }' | python -m json.tool ... { ... "container_name": "kube-eventrouter", "host": "rhocp410-worker4.rhocp410.cluster.sub.nec.test", "labels": { "component": "eventrouter", "logging-infra": "eventrouter", "pod-template-hash": "7b6c97b8b5", "provider": "openshift" }, "master_url": "https://kubernetes.default.svc", "namespace_id": "8aa7ba9d-df50-11e9-ba89-52540007a421", "namespace_labels": { "openshift_io/cluster-logging": "true", "openshift_io/cluster-monitoring": "true" }, "namespace_name": "openshift-logging", "pod_id": "6c389398-e34e-11e9-beed-525400f9452d", "pod_name": "cluster-logging-eventrouter-7b6c97b8b5-qtn4m" }, "level": "info", "message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"etcd-quorum-guard-884c9bc99-srjbd.15c8e6fa0258b305\",\"namespace\":\"openshift-machine-config-operator\",\"selfLink\":\"/api/v1/namespaces/openshift-machine-config-operator/events/etcd-quorum-guard-884c9bc99-srjbd.15c8e6fa0258b305\",\"uid\":\"2fce647c-e341-11e9-b4be-52540007a421\",\"resourceVersion\":\"32006573\",\"creationTimestamp\":\"2019-09-30T05:14:32Z\"},\"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"openshift-machine-config-operator\",\"name\":\"etcd-quorum-guard-884c9bc99-srjbd\",\"uid\":\"767b9d3e-df5e-11e9-aaae-52540007a421\",\"apiVersion\":\"v1\",\"resourceVersion\":\"29503451\",\"fieldPath\":\"spec.containers{guard}\"},\"reason\":\"Unhealthy\",\"message\":\"Readiness probe failed: \",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-master2.rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-09-29T12:00:31Z\",\"lastTimestamp\":\"2019-09-30T05:14:41Z\",\"count\":4,\"type\":\"Warning\"}}", ... 2. Bug Details: a) Architectures: 64-bit Intel EM64T/AMD64 x86_64 b) Bugzilla Dependencies: c) Drivers or hardware dependencies: d) Upstream acceptance information: e) External links: f) Severity (H,M,L): H g) How reproducible: Always h) Steps to Reproduce: Step1: Deploy EFK and EventRouter on OpenShift4.1 environment according to the following documents. EFK: https://docs.openshift.com/container-platform/4.1/logging/efk-logging-deploying.html EventRouter: https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html Step2: Call ElasticSearch API to check kubernetes events which are sent by EventRouter via fluentd $ oc create route passthrough --service=elasticsearch -n openshift-logging route.route.openshift.io/elasticsearch created $ URL=$(oc get route -n openshift-logging elasticsearch --no-headers | awk '{ print $2}') $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{ "query": { "bool" : { "must" : [ { "range" : { "@timestamp" : { "gte": "now-1h", "lt": "now", "time_zone": "+09:00" } } }, { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }} ] } } }' | python -m json.tool i) Actual results: ElasticSearch returns data which weren't transformed by fluentd j) Expected results: ElasticSearch returns data which were transformed by fluentd k) Additional information OpenShift version: OCP 4.1.17 ElasticSearch Operator version: elasticsearch-operator.4.1.18-201909201915 In RHOCP3.11, TRANSFORM_EVENTS is set to true in fluentd when eventrouter was deployed by ansible-playbook. On the other hand, in RHOCP4.1, TRANSFORM_EVENTS isn't set when deploying eventrouter. "process_kubernetes_events false" is set due to it. That's why fluentd doesn't transform kubernetes events in RHOCP4.1. $ oc exec -it fluentd-2k7tz -n openshift-logging -- grep -A 2 TRANSFORM_EVENTS run.sh # Disable process_kubernetes_events if TRANSFORM_EVENTS is false client. if [ "${TRANSFORM_EVENTS:-}" != true ] ; then sed -i 's/\(.*@type viaq_data_model.*\)/\1\n process_kubernetes_events false/' $CFG_DIR/openshift/filter-viaq-data-model.conf fi 3. Business impact: Users cannot monitor kubernetes events easily due to this problem. EFK is a core component of OCP and it's very important to monitor OCP clusters. It should be fixed as soon as possible. 4. Primary contact at Red Hat, email, phone (chat) mfuruta 5. Primary contact at Partner, email, phone (chat) mas-hatada.nec.com (Masaki Hatada) --- Additional comment from Jeff Cantrill on 2019-09-30 17:28:02 UTC --- This is not a blocker for 4.2. Pushing to 4.3 to get off the blocker list. Likely any fix will need to be pulled back into 4.2 --- Additional comment from Rich Megginson on 2019-09-30 18:23:16 UTC --- Looks like we need to add a field for transform_events to the clusterlogging CR. We're not likely to do this in 4.2. In the meantime, the workaround is the following. WARNING WARNING WARNING - your logging deployment will be unmanaged and will not get any updates if you follow these directions! So maybe the workaround is not for you. If you want to proceed, read on. oc edit clusterlogging instance change the managementState from Managed to Unmanaged then oc set env ds/fluentd TRANSFORM_EVENTS=true This will redeploy fluentd with the correct settings. You should now see kubernetes events in Elasticsearch/Kibana. --- Additional comment from Masaki Hatada on 2019-10-01 06:17:04 UTC --- > In the meantime, the workaround is the following. > WARNING WARNING WARNING - your logging deployment will be unmanaged and will not get any updates if you follow these directions! So maybe the workaround is not for you. If you want to proceed, read on. > > oc edit clusterlogging instance > > change the managementState from Managed to Unmanaged > > then > > oc set env ds/fluentd TRANSFORM_EVENTS=true > > This will redeploy fluentd with the correct settings. You should now see kubernetes events in Elasticsearch/Kibana. Thank you for the information. However, fluentd doesn't transform kubernetes events even after setting TRANSFORM_EVENTS=true. I noticed that /etc/fluent/configs.d/openshift/filter-retag-journal.conf expects ^k8s_[^_]+_logging-eventrouter-[^_]+_ as the container name of kube-eventrouter. $ oc exec -it fluentd-2g67p -n openshift-logging cat /etc/fluent/configs.d/openshift/filter-retag-journal.conf | grep -B1 -A1 eventrouter key CONTAINER_NAME pattern ^k8s_[^_]+_logging-eventrouter-[^_]+_ tag kubernetes.journal.container._default_.kubernetes-event In RHOCP3.11, it corresponds to the container name of kube-eventrouter. $ ssh ocp311-master2 docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ... cc253662b503 fe6bf428d35d "/bin/sh -c '/usr/..." 3 weeks ago Up 3 weeks k8s_kube-eventrouter_logging-eventrouter-1-ntqg2_default_ce4d7b2d-c58f-11e9-8a67-fa163e80c746_1 On the other hand, it doesn't correspond to the container name of kube-eventrouter in RHOCP4.1. $ ssh core@rhocp410-worker2 sudo crictl ps CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT POD ID ac0e82117a2ad ee6f719c73b9d4149866eec45676a538ee30a14cad83e1da86b6ed75e41f8c56 8 minutes ago Running fluentd 0 8b4c5a1c5f181 2dd0478e341ca registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052 About an hour ago Running kube-eventrouter 0 c285022fc8934 $ ssh core@rhocp410-worker2 sudo crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT ... c285022fc8934 40 minutes ago Ready cluster-logging-eventrouter-7b6c97b8b5-v9xtg openshift-logging 0 /etc/fluent/configs.d/openshift/filter-retag-journal.conf have to be updated for RHOCP4.1. --- Additional comment from Rich Megginson on 2019-10-01 13:47:08 UTC --- > /etc/fluent/configs.d/openshift/filter-retag-journal.conf have to be updated for RHOCP4.1. We are not planning to fix this bug until OCP 4.3. --- Additional comment from Rich Megginson on 2019-10-01 14:09:03 UTC --- (In reply to Rich Megginson from comment #4) > > /etc/fluent/configs.d/openshift/filter-retag-journal.conf have to be updated for RHOCP4.1. > > We are not planning to fix this bug until OCP 4.3. filter-retag-journal.conf _only_ applies to logs read from systemd journald, using the docker --log-driver=journald. In 4.1, the only supported log driver is cri-o with the file log driver. fluentd is expecting eventrouter logs to come from a pod named "logging-eventrouter-*" running in namespace "default". How are you deploying the eventrouter in 4.1? eventrouter records are processed here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model.rb#L452 it is expected that the outcome of this processing is a field named "kubernetes.event.metadata.uid" which is the UUID of the event. This UUID is used as the elasticsearch ID - https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/filter-post-genid.conf#L6 but _only_ if the record tag matches "kubernetes.var.log.containers.logging-eventrouter-*.** kubernetes.journal.container._default_.kubernetes-event" - the first pattern corresponds to a cri-o log driver log for a pod name matching "logging-eventrouter-*" (that is, something like "logging-eventrouter-xyzyx" not "logging-eventrouter-xyzyx-something"). There is no namespace name part of that match - however - the code was designed for the eventrouter running in the "default" namespace, and there may be other hidden assumptions elsewhere in the code. "default" is considered an "infra" log, like logs coming from other "openshift*" or "kube*" namespaces, and can only be seen by cluster admins. Using the event UUID as the elasticsearch record ID gives you an easy way to correlate records in elasticsearch back to the exact source event. If it is not possible anymore in 4.x to deploy the eventrouter with a specific name, or in the "default" namespace, then there is probably no easy remedy, and we will have to fix the code/configs to make it work in 4.x. --- Additional comment from Masaki Hatada on 2019-10-02 01:41:49 UTC --- > fluentd is expecting eventrouter logs to come from a pod named "logging-eventrouter-*" running in namespace "default". How are you deploying the eventrouter in 4.1? I deployed eventrouter just according to https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html. However, fluentd doesn't transform kubernetes events even if I deployed eventrouter on default namespace with pod name "logging-eventrouter-*". $ oc exec -it fluentd-6mn6j -n openshift-logging env | grep TRANS TRANSFORM_EVENTS=true $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{ ...(snip)... { "_id": "ZTk0YzViNDAtMGM2OC00M2E3LTliZjUtMTk5Y2I1Yzg3Nzk2", "_index": ".operations.2019.10.02", "_score": 2.6822762, "_source": { "@timestamp": "2019-10-02T01:28:51.929091+00:00", "docker": { "container_id": "c3e8b061aca97a6e0e82c73f6208b39fc2a00d1c168fe876f171f861d407e8e1" }, "hostname": "rhocp410-worker1.rhocp410.cluster.sub.nec.test", "kubernetes": { "container_image": "registry.redhat.io/openshift4/ose-logging-eventrouter:latest", "container_image_id": "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052", "container_name": "kube-eventrouter", "host": "rhocp410-worker1.rhocp410.cluster.sub.nec.test", "labels": { "component": "eventrouter", "logging-infra": "eventrouter", "pod-template-hash": "54d8598c85", "provider": "openshift" }, "master_url": "https://kubernetes.default.svc", "namespace_id": "12ea31ba-9bcf-11e9-9994-52540086c9e7", ★namespace "namespace_name": "default", "pod_id": "e122c94b-e4a8-11e9-80f2-52540007a421", ★pod name "pod_name": "logging-eventrouter-54d8598c85-njt2c" }, "level": "info", "message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-1.15c9b03f81118baf\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/namespaces/test/events/django-psql-example-1.15c9b03f81118baf\",\"uid\":\"fdc19753-e4b3-11e9-8b57-525400631ae9\",\"resourceVersion\":\"32864698\",\"creationTimestamp\":\"2019-10-02T01:28:51Z\"},\"involvedObject\":{\"kind\":\"ReplicationController\",\"namespace\":\"test\",\"name\":\"django-psql-example-1\",\"uid\":\"24909921-e3f7-11e9-949b-52540007a421\",\"apiVersion\":\"v1\",\"resourceVersion\":\"32864560\"},\"reason\":\"SuccessfulCreate\",\"message\":\"Created pod: django-psql-example-1-mppsh\",\"source\":{\"component\":\"replication-controller\"},\"firstTimestamp\":\"2019-10-02T01:28:51Z\",\"lastTimestamp\":\"2019-10-02T01:28:51Z\",\"count\":1,\"type\":\"Normal\"}}", "pipeline_metadata": { "collector": { "inputname": "fluent-plugin-systemd", "ipaddr4": "10.128.1.121", "ipaddr6": "fe80::a8dc:e4ff:fe60:f2", "name": "fluentd", "received_at": "2019-10-02T01:28:52.649247+00:00", "version": "1.5.1 1.6.0" } }, "viaq_msg_id": "ZTk0YzViNDAtMGM2OC00M2E3LTliZjUtMTk5Y2I1Yzg3Nzk2" }, "_type": "com.redhat.viaq.common" } ], "max_score": 2.784227, "total": 22 }, "timed_out": false, "took": 499 > We are not planning to fix this bug until OCP 4.3. Eventrouter is one of important features for monitoring OCP cluster, and our customer, who is planning to migrate their system from OCP3.11 to OCP4.2, is using the feature. It looks like a regression for the customer. So we would like Red Hat to fix this problem in OCP4.2 at least. --- Additional comment from Rich Megginson on 2019-10-02 02:14:45 UTC --- I think the problem is that, in order to use the eventrouter, you must use MERGE_JSON_LOG=true. In 3.11, this was true by default, and it caused many, many problems. So in 4.x, we changed the default to MERGE_JSON_LOG=false. See https://docs.openshift.com/container-platform/4.1/logging/config/efk-logging-fluentd.html#efk-logging-fluentd-json_efk-logging-fluentd Please also file a doc bug because https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is not correct - it has the wrong pod name and namespace, and perhaps other problems. --- Additional comment from Masaki Hatada on 2019-10-02 04:16:03 UTC --- > I think the problem is that, in order to use the eventrouter, you must use MERGE_JSON_LOG=true. In 3.11, this was true by default, and it caused many, many problems. So in 4.x, we changed the default to MERGE_JSON_LOG=false. Thank you for the information. After I set MERGE_JSON_LOG=true and TRANSFORM_EVENTS=true, fluentd became able to transform kubernetes events. If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix fluentd so that it can transform kubernetes events without the parameter? > Please also file a doc bug because https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is not correct - it has the wrong pod name and namespace, and perhaps other problems. It seems that fluentd can transform kubernetes events even if eventrouter was deployed on "openshift-logging" namespace with a pod name "cluster-logging-eventrouter-*". Should I still change its namespace and pod name? I pasted the terminal log below. $ oc exec -it fluentd-ncn4n -n openshift-logging env | grep -e MERGE_JSON_LOG -e TRANSFORM_EVENTS MERGE_JSON_LOG=true TRANSFORM_EVENTS=true $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{ "query": { "bool" : { "must" : [ { "range" : { "@timestamp" : { "gte": "now-5m", "lt": "now", "time_zone": "+09:00" } } }, { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }} ] } } }' | python -m json.tool { ... "kubernetes": { "container_image": "registry.redhat.io/openshift4/ose-logging-eventrouter:latest", "container_image_id": "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052", "container_name": "kube-eventrouter", "event": { "count": 1, "firstTimestamp": "2019-10-02T03:51:03Z", "involvedObject": { "apiVersion": "v1", "fieldPath": "spec.containers{django-psql-example}", "kind": "Pod", "name": "django-psql-example-1-g2hxs", "namespace": "test", "resourceVersion": "32902882", "uid": "b424a806-e4c4-11e9-8b57-525400631ae9" }, "lastTimestamp": "2019-10-02T03:51:03Z", "metadata": { "name": "django-psql-example-1-g2hxs.15c9b801f4b62577", "namespace": "test", "resourceVersion": "32910091", "selfLink": "/api/v1/namespaces/test/events/django-psql-example-1-g2hxs.15c9b801f4b62577", "uid": "db129fc2-e4c7-11e9-ba59-525400f9452d" }, "reason": "Unhealthy", "source": { "component": "kubelet", "host": "rhocp410-worker1.rhocp410.cluster.sub.nec.test" }, "type": "Warning", "verb": "ADDED" }, "host": "rhocp410-worker1.rhocp410.cluster.sub.nec.test", "labels": { "component": "eventrouter", "logging-infra": "eventrouter", "pod-template-hash": "7b6c97b8b5", "provider": "openshift" }, "master_url": "https://kubernetes.default.svc", "namespace_id": "8aa7ba9d-df50-11e9-ba89-52540007a421", "namespace_labels": { "openshift_io/cluster-logging": "true", "openshift_io/cluster-monitoring": "true" }, "namespace_name": "openshift-logging", "pod_id": "140ad1e1-e4c2-11e9-8b57-525400631ae9", "pod_name": "cluster-logging-eventrouter-7b6c97b8b5-cdkbq" }, "level": "info", "message": "Liveness probe failed: Get http://10.128.1.157:8080/health: read tcp 10.128.0.1:33156->10.128.1.157:8080: read: connection reset by peer", "pipeline_metadata": { "collector": { "inputname": "fluent-plugin-systemd", "ipaddr4": "10.128.1.158", "ipaddr6": "fe80::8a8:81ff:fe9c:d0c", "name": "fluentd", "original_raw_message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-1-g2hxs.15c9b801f4b62577\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/namespaces/test/events/django-psql-example-1-g2hxs.15c9b801f4b62577\",\"uid\":\"db129fc2-e4c7-11e9-ba59-525400f9452d\",\"resourceVersion\":\"32910091\",\"creationTimestamp\":\"2019-10-02T03:51:03Z\"},\"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"test\",\"name\":\"django-psql-example-1-g2hxs\",\"uid\":\"b424a806-e4c4-11e9-8b57-525400631ae9\",\"apiVersion\":\"v1\",\"resourceVersion\":\"32902882\",\"fieldPath\":\"spec.containers{django-psql-example}\"},\"reason\":\"Unhealthy\",\"message\":\"Liveness probe failed: Get http://10.128.1.157:8080/health: read tcp 10.128.0.1:33156-\\u003e10.128.1.157:8080: read: connection reset by peer\",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-worker1.rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-10-02T03:51:03Z\",\"lastTimestamp\":\"2019-10-02T03:51:03Z\",\"count\":1,\"type\":\"Warning\"}}", "received_at": "2019-10-02T03:51:03.759610+00:00", "version": "1.5.1 1.6.0" } }, "viaq_msg_id": "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4" }, "_type": "com.redhat.viaq.common" } --- Additional comment from Masaki Hatada on 2019-10-02 07:33:38 UTC --- Hi Furuta-san, I would like to know what information I have to provide. --- Additional comment from Rich Megginson on 2019-10-02 12:56:25 UTC --- (In reply to Masaki Hatada from comment #8) > > I think the problem is that, in order to use the eventrouter, you must use MERGE_JSON_LOG=true. In 3.11, this was true by default, and it caused many, many problems. So in 4.x, we changed the default to MERGE_JSON_LOG=false. > > Thank you for the information. > After I set MERGE_JSON_LOG=true and TRANSFORM_EVENTS=true, fluentd became > able to transform kubernetes events. > > If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix > fluentd so that it can transform kubernetes events without the parameter? We don't recommend it to be used by default for all customers. However, if you need to use it, and it works for you in your use case, and doesn't cause any problems, then please continue to use it. > > > Please also file a doc bug because https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is not correct - it has the wrong pod name and namespace, and perhaps other problems. > > It seems that fluentd can transform kubernetes events even if eventrouter > was deployed on "openshift-logging" namespace with a pod name > "cluster-logging-eventrouter-*". Yes, it "works" but this: > "viaq_msg_id": > "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4" That is, it is using an auto-generated random value for the Elasticsearch document ID rather than the UUID from the record - kubernetes.event.metadata.uid "db129fc2-e4c7-11e9-ba59-525400f9452d" If this "works" for you, that's fine. > Should I still change its namespace and pod name? > > I pasted the terminal log below. > > $ oc exec -it fluentd-ncn4n -n openshift-logging env | grep -e > MERGE_JSON_LOG -e TRANSFORM_EVENTS > MERGE_JSON_LOG=true > TRANSFORM_EVENTS=true > > $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H > "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d > '{ > "query": { > "bool" : { > "must" : [ > { "range" : { "@timestamp" : { "gte": "now-5m", "lt": > "now", "time_zone": "+09:00" } } }, > { "match_phrase" : { "kubernetes.container_name": > "kube-eventrouter" }} > ] > } > } > }' | python -m json.tool > { > ... > > "kubernetes": { > "container_image": > "registry.redhat.io/openshift4/ose-logging-eventrouter:latest", > "container_image_id": > "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256: > ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052", > "container_name": "kube-eventrouter", > "event": { > "count": 1, > "firstTimestamp": "2019-10-02T03:51:03Z", > "involvedObject": { > "apiVersion": "v1", > "fieldPath": > "spec.containers{django-psql-example}", > "kind": "Pod", > "name": "django-psql-example-1-g2hxs", > "namespace": "test", > "resourceVersion": "32902882", > "uid": > "b424a806-e4c4-11e9-8b57-525400631ae9" > }, > "lastTimestamp": "2019-10-02T03:51:03Z", > "metadata": { > "name": > "django-psql-example-1-g2hxs.15c9b801f4b62577", > "namespace": "test", > "resourceVersion": "32910091", > "selfLink": > "/api/v1/namespaces/test/events/django-psql-example-1-g2hxs. > 15c9b801f4b62577", > "uid": > "db129fc2-e4c7-11e9-ba59-525400f9452d" > }, > "reason": "Unhealthy", > "source": { > "component": "kubelet", > "host": > "rhocp410-worker1.rhocp410.cluster.sub.nec.test" > }, > "type": "Warning", > "verb": "ADDED" > }, > "host": > "rhocp410-worker1.rhocp410.cluster.sub.nec.test", > "labels": { > "component": "eventrouter", > "logging-infra": "eventrouter", > "pod-template-hash": "7b6c97b8b5", > "provider": "openshift" > }, > "master_url": "https://kubernetes.default.svc", > "namespace_id": > "8aa7ba9d-df50-11e9-ba89-52540007a421", > "namespace_labels": { > "openshift_io/cluster-logging": "true", > "openshift_io/cluster-monitoring": "true" > }, > "namespace_name": "openshift-logging", > "pod_id": "140ad1e1-e4c2-11e9-8b57-525400631ae9", > "pod_name": > "cluster-logging-eventrouter-7b6c97b8b5-cdkbq" > }, > "level": "info", > "message": "Liveness probe failed: Get > http://10.128.1.157:8080/health: read tcp > 10.128.0.1:33156->10.128.1.157:8080: read: connection reset by peer", > "pipeline_metadata": { > "collector": { > "inputname": "fluent-plugin-systemd", > "ipaddr4": "10.128.1.158", > "ipaddr6": "fe80::8a8:81ff:fe9c:d0c", > "name": "fluentd", > "original_raw_message": > "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example- > 1-g2hxs.15c9b801f4b62577\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/ > namespaces/test/events/django-psql-example-1-g2hxs.15c9b801f4b62577\", > \"uid\":\"db129fc2-e4c7-11e9-ba59-525400f9452d\",\"resourceVersion\": > \"32910091\",\"creationTimestamp\":\"2019-10-02T03:51:03Z\"}, > \"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"test\",\"name\": > \"django-psql-example-1-g2hxs\",\"uid\":\"b424a806-e4c4-11e9-8b57- > 525400631ae9\",\"apiVersion\":\"v1\",\"resourceVersion\":\"32902882\", > \"fieldPath\":\"spec.containers{django-psql-example}\"},\"reason\": > \"Unhealthy\",\"message\":\"Liveness probe failed: Get > http://10.128.1.157:8080/health: read tcp > 10.128.0.1:33156-\\u003e10.128.1.157:8080: read: connection reset by > peer\",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-worker1. > rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-10-02T03:51:03Z\", > \"lastTimestamp\":\"2019-10-02T03:51:03Z\",\"count\":1,\"type\": > \"Warning\"}}", > "received_at": > "2019-10-02T03:51:03.759610+00:00", > "version": "1.5.1 1.6.0" > } > }, > "viaq_msg_id": > "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4" > }, > "_type": "com.redhat.viaq.common" > } --- Additional comment from Masaki Hatada on 2019-10-03 09:53:10 UTC --- >> If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix >> fluentd so that it can transform kubernetes events without the parameter? > > We don't recommend it to be used by default for all customers. > However, if you need to use it, and it works for you in your use case, and doesn't cause any problems, then please continue to use it. Although I know you doesn't recommend setting MERGE_JSON_LOG=true, EFK cannot handle kubernetes events correctly without this parameter. Doesn't Red Hat have a plan to fix fluentd so that it becomes able to handle kubernetes events without this parameter? >> "viaq_msg_id": >> "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4" > >That is, it is using an auto-generated random value for the Elasticsearch document ID rather than the UUID from the record - kubernetes.event.metadata.uid "db129fc2-e4c7-11e9-ba59-525400f9452d" > >If this "works" for you, that's fine. I verified that viaq_msg_id is set to kubernetes.event.metadata.uid if eventrouter was deployed on default namespace with the pod name "logging-eventrouter". So I hope that https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is updated. --- Additional comment from Rich Megginson on 2019-10-03 15:28:07 UTC --- (In reply to Masaki Hatada from comment #11) > >> If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix > >> fluentd so that it can transform kubernetes events without the parameter? > > > > We don't recommend it to be used by default for all customers. > > However, if you need to use it, and it works for you in your use case, and doesn't cause any problems, then please continue to use it. > > Although I know you doesn't recommend setting MERGE_JSON_LOG=true, EFK > cannot handle kubernetes events correctly without this parameter. > Doesn't Red Hat have a plan to fix fluentd so that it becomes able to handle > kubernetes events without this parameter? The plan is the fix for this bz. This bz is targeted for 4.3, where it will be triaged and prioritized along with all of the other bugs for 4.3. Since there is a workaround, I don't think this bug needs to be urgently fixed in 4.2 or 4.1. > > >> "viaq_msg_id": > >> "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4" > > > >That is, it is using an auto-generated random value for the Elasticsearch document ID rather than the UUID from the record - kubernetes.event.metadata.uid "db129fc2-e4c7-11e9-ba59-525400f9452d" > > > >If this "works" for you, that's fine. > > I verified that viaq_msg_id is set to kubernetes.event.metadata.uid if > eventrouter was deployed on default namespace with the pod name > "logging-eventrouter". > So I hope that > https://docs.openshift.com/container-platform/4.1/logging/efk-logging- > eventrouter.html is updated. https://bugzilla.redhat.com/show_bug.cgi?id=1758222 - please add anything to that BZ that you feel I may have missed --- Additional comment from Masaki Hatada on 2019-10-04 00:31:33 UTC --- >> Although I know you doesn't recommend setting MERGE_JSON_LOG=true, EFK >> cannot handle kubernetes events correctly without this parameter. >> Doesn't Red Hat have a plan to fix fluentd so that it becomes able to handle >> kubernetes events without this parameter? > >The plan is the fix for this bz. This bz is targeted for 4.3, where it will be triaged and prioritized along with all of the other bugs for 4.3. Since there is a workaround, I don't think this bug needs to be urgently fixed in 4.2 or 4.1. I understood that. I'll text here if I received any big business impact from customer about this issue. >> I verified that viaq_msg_id is set to kubernetes.event.metadata.uid if >> eventrouter was deployed on default namespace with the pod name >> "logging-eventrouter". >> So I hope that >> https://docs.openshift.com/container-platform/4.1/logging/efk-logging- >> eventrouter.html is updated. > >https://bugzilla.redhat.com/show_bug.cgi?id=1758222 - please add anything to that BZ that you feel I may have missed Thank you for creating a new Bugzilla. Although it hasn't mentioned changing namespace from "openshift-logging" to "default", I believe it's not needed because viaq_msg_id is set to kubernetes.event.metadata.uid even when logging-eventrouter pod is running on "openshift-logging" namespace. $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{ "query": { "bool" : { "must" : [ { "range" : { "@timestamp" : { "gte": "now-15m", "lt": "now", "time_zone": "+09:00" } } }, { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }} ] } } }' | python -m json.tool ... { "_id": "bd035e15-e63a-11e9-8f60-525400631ae9", "_index": ".operations.2019.10.04", "_score": 3.502013, "_source": { "@timestamp": "2019-10-04T00:05:56Z", "docker": { "container_id": "4aca9d9a8cbb2f6f8cd057075064c745c684f81b4b1fbcc27e8856733fee8e36" }, "hostname": "rhocp410-worker2.rhocp410.cluster.sub.nec.test", "kubernetes": { "container_image": "registry.redhat.io/openshift4/ose-logging-eventrouter:latest", "container_image_id": "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052", "container_name": "kube-eventrouter", "event": { "count": 3, "firstTimestamp": "2019-10-04T00:05:56Z", "involvedObject": { "apiVersion": "v1", "fieldPath": "spec.containers{django-psql-example}", "kind": "Pod", "name": "django-psql-example-1-74gd6", "namespace": "test", "resourceVersion": "33746648", "uid": "b7e3e9fb-e63a-11e9-8f60-525400631ae9" }, "lastTimestamp": "2019-10-04T00:06:28Z", "metadata": { "name": "django-psql-example-1-74gd6.15ca48e236bc9e32", "namespace": "test", "resourceVersion": "33746880", "selfLink": "/api/v1/namespaces/test/events/django-psql-example-1-74gd6.15ca48e236bc9e32", "uid": "bd035e15-e63a-11e9-8f60-525400631ae9" }, "reason": "Created", "source": { "component": "kubelet", "host": "rhocp410-worker2.rhocp410.cluster.sub.nec.test" }, "type": "Normal", "verb": "ADDED" }, "host": "rhocp410-worker2.rhocp410.cluster.sub.nec.test", "labels": { "component": "eventrouter", "logging-infra": "eventrouter", "pod-template-hash": "54d8598c85", "provider": "openshift" }, "master_url": "https://kubernetes.default.svc", "namespace_id": "8aa7ba9d-df50-11e9-ba89-52540007a421", "namespace_labels": { "openshift_io/cluster-logging": "true", "openshift_io/cluster-monitoring": "true" }, ★ "namespace_name": "openshift-logging", "pod_id": "fb6e289f-e63a-11e9-8f60-525400631ae9", ★ "pod_name": "logging-eventrouter-54d8598c85-f7tcn" }, "level": "info", "message": "Created container django-psql-example", "pipeline_metadata": { "collector": { "inputname": "fluent-plugin-systemd", "ipaddr4": "10.129.1.160", "ipaddr6": "fe80::6864:91ff:fe96:a6b5", "name": "fluentd", "original_raw_message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-1-74gd6.15ca48e236bc9e32\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/namespaces/test/events/django-psql-example-1-74gd6.15ca48e236bc9e32\",\"uid\":\"bd035e15-e63a-11e9-8f60-525400631ae9\",\"resourceVersion\":\"33746880\",\"creationTimestamp\":\"2019-10-04T00:05:56Z\"},\"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"test\",\"name\":\"django-psql-example-1-74gd6\",\"uid\":\"b7e3e9fb-e63a-11e9-8f60-525400631ae9\",\"apiVersion\":\"v1\",\"resourceVersion\":\"33746648\",\"fieldPath\":\"spec.containers{django-psql-example}\"},\"reason\":\"Created\",\"message\":\"Created container django-psql-example\",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-worker2.rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-10-04T00:05:56Z\",\"lastTimestamp\":\"2019-10-04T00:06:28Z\",\"count\":3,\"type\":\"Normal\"}}", "received_at": "2019-10-04T00:08:38.171034+00:00", "version": "1.5.1 1.6.0" } }, "viaq_msg_id": "bd035e15-e63a-11e9-8f60-525400631ae9" }, "_type": "com.redhat.viaq.common" } ], "max_score": 3.502013, "total": 138 }, "timed_out": false, "took": 493 } --- Additional comment from Rich Megginson on 2019-10-04 17:21:58 UTC --- > Flags: needinfo?(rmeggins) What info do you need? --- Additional comment from Masaki Hatada on 2019-10-08 02:15:39 UTC --- > What info do you need? Sorry, nothing in particular.
https://github.com/openshift/origin-aggregated-logging/pull/1778
Verified in 4.2.9. The eventrouter are gather and transform to json
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4093