Bug 1763282 - fluentd pods doesn't process kubernetes events
Summary: fluentd pods doesn't process kubernetes events
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.2.z
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.2.z
Assignee: Rich Megginson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1756920
Blocks: 1723620
TreeView+ depends on / blocked
 
Reported: 2019-10-18 16:39 UTC by Rich Megginson
Modified: 2019-12-11 22:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In OCP 4.x you cannot set the environment variable TRANSFORM_EVENTS=true for fluentd in order to make it process and transform eventrouter events, without setting the cluster to Unmanaged. Consequence: In OCP 4.x there is no supported way to transform and store eventrouter events. Fix: Allow fluentd to always transform events, assuming the user has followed the documentation about how to deploy the eventrouter for use with logging. That is, the eventrouter pod must be deployed with a name matching "eventrouter-*". Fluentd will assume any pod with a name matching this is an eventrouter and will process the records as if they come from an eventrouter. WARNING: Do not use the name "eventrouter-*" for ANY OTHER REASON! Result: OpenShift logging in 4.x can process and store eventrouter records in Managed mode.
Clone Of: 1756920
Environment:
Last Closed: 2019-12-11 22:36:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 1778 0 'None' closed Bug 1763282: fluentd pods doesn't process kubernetes events 2020-10-19 11:08:40 UTC
Red Hat Product Errata RHBA-2019:4093 0 None None None 2019-12-11 22:36:16 UTC

Description Rich Megginson 2019-10-18 16:39:37 UTC
+++ This bug was initially created as a clone of Bug #1756920 +++

1. Bug Overview:
a) Description of bug report: 

  [RHOCP4.1] fluentd pods doesn't process kubernetes events

b) Bug Description: 

  In OCP3.11, fluentd analyzes and transforms kubernetes events before sending them to ElasticSearch.
  On the other hand, in OCP4.11, fluentd doesn't analyze and sends kubernetes events to ElasticSearch without transforming them.

  OCP3.11:

    $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{
         "query": {
           "bool" : {
             "must" : [
                { "range" : { "@timestamp" : { "gte": "now-1d", "lt": "now", "time_zone": "+09:00" } } },
                { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }}
              ]
           }
         }
       }' | python -m json.tool
    ...
            {
    ...
                        "container_name": "kube-eventrouter",
                        "event": {
                            "count": 1,
                            "firstTimestamp": "2019-09-29T18:30:12Z",
                            "involvedObject": {
                                "apiVersion": "batch/v1beta1",
                                "kind": "CronJob",
                                "name": "logging-curator",
                                "namespace": "openshift-logging",
                                "resourceVersion": "794794",
                                "uid": "631314eb-e03e-11e9-abf0-fa163e6ee2bd" 
                            },
                            "lastTimestamp": "2019-09-29T18:30:12Z",
                            "metadata": {
                                "name": "logging-curator.15c8fc3dbd4b78b2",
                                "namespace": "openshift-logging",
                                "resourceVersion": "794799",
                                "selfLink": "/api/v1/namespaces/openshift-logging/events/logging-curator.15c8fc3dbd4b78b2",
                                "uid": "2c70ebce-e2e7-11e9-8fe6-fa163e694aa3" 
                            },
                            "reason": "SuccessfulDelete",
                            "source": {
                                "component": "cronjob-controller" 
                            },
                            "type": "Normal",
                            "verb": "ADDED" 
                        },
                        "host": "hatosp311-node2",
                        "labels": {
                            "component": "eventrouter",
                            "deployment": "logging-eventrouter-1",
                            "deploymentconfig": "logging-eventrouter",
                            "logging-infra": "eventrouter",
                            "provider": "openshift" 
                        },
                        "master_url": "https://kubernetes.default.svc.cluster.local",
                        "namespace_id": "51b29781-e03b-11e9-abf0-fa163e6ee2bd",
                        "namespace_name": "default",
                        "pod_id": "9177bc57-e03e-11e9-abf0-fa163e6ee2bd",
                        "pod_name": "logging-eventrouter-1-jnjwn" 
                    },
                    "level": "info",
                    "message": "Deleted job logging-curator-1569695400",
                    "pipeline_metadata": {
                        "collector": {
                            "inputname": "fluent-plugin-systemd",
                            "ipaddr4": "10.131.0.58",
                            "ipaddr6": "fe80::10aa:c4ff:febf:4e6f",
                            "name": "fluentd",
                            "original_raw_message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"logging-curator.15c8fc3dbd4b78b2\",\"namespace\":\"openshift-logging\",\"selfLink\":\"/api/v1/namespaces/openshift-logging/events/logging-curator.15c8fc3dbd4b78b2\",\"uid\":\"2c70ebce-e2e7-11e9-8fe6-fa163e694aa3\",\"resourceVersion\":\"794799\",\"creationTimestamp\":\"2019-09-29T18:30:12Z\"},\"involvedObject\":{\"kind\":\"CronJob\",\"namespace\":\"openshift-logging\",\"name\":\"logging-curator\",\"uid\":\"631314eb-e03e-11e9-abf0-fa163e6ee2bd\",\"apiVersion\":\"batch/v1beta1\",\"resourceVersion\":\"794794\"},\"reason\":\"SuccessfulDelete\",\"message\":\"Deleted job logging-curator-1569695400\",\"source\":{\"component\":\"cronjob-controller\"},\"firstTimestamp\":\"2019-09-29T18:30:12Z\",\"lastTimestamp\":\"2019-09-29T18:30:12Z\",\"count\":1,\"type\":\"Normal\"}}\n",
                            "received_at": "2019-09-29T18:30:13.842179+00:00",
                            "version": "0.12.43 1.6.0" 
    ...

  OCP4.1:

    $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{
         "query": {
           "bool" : {
             "must" : [
                { "range" : { "@timestamp" : { "gte": "now-1h", "lt": "now", "time_zone": "+09:00" } } },
                { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }}
              ]
           }
         }
       }' | python -m json.tool
    ...
                {
    ...
                            "container_name": "kube-eventrouter",
                            "host": "rhocp410-worker4.rhocp410.cluster.sub.nec.test",
                            "labels": {
                                "component": "eventrouter",
                                "logging-infra": "eventrouter",
                                "pod-template-hash": "7b6c97b8b5",
                                "provider": "openshift" 
                            },
                            "master_url": "https://kubernetes.default.svc",
                            "namespace_id": "8aa7ba9d-df50-11e9-ba89-52540007a421",
                            "namespace_labels": {
                                "openshift_io/cluster-logging": "true",
                                "openshift_io/cluster-monitoring": "true" 
                            },
                            "namespace_name": "openshift-logging",
                            "pod_id": "6c389398-e34e-11e9-beed-525400f9452d",
                            "pod_name": "cluster-logging-eventrouter-7b6c97b8b5-qtn4m" 
                        },
                        "level": "info",
                        "message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"etcd-quorum-guard-884c9bc99-srjbd.15c8e6fa0258b305\",\"namespace\":\"openshift-machine-config-operator\",\"selfLink\":\"/api/v1/namespaces/openshift-machine-config-operator/events/etcd-quorum-guard-884c9bc99-srjbd.15c8e6fa0258b305\",\"uid\":\"2fce647c-e341-11e9-b4be-52540007a421\",\"resourceVersion\":\"32006573\",\"creationTimestamp\":\"2019-09-30T05:14:32Z\"},\"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"openshift-machine-config-operator\",\"name\":\"etcd-quorum-guard-884c9bc99-srjbd\",\"uid\":\"767b9d3e-df5e-11e9-aaae-52540007a421\",\"apiVersion\":\"v1\",\"resourceVersion\":\"29503451\",\"fieldPath\":\"spec.containers{guard}\"},\"reason\":\"Unhealthy\",\"message\":\"Readiness probe failed: \",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-master2.rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-09-29T12:00:31Z\",\"lastTimestamp\":\"2019-09-30T05:14:41Z\",\"count\":4,\"type\":\"Warning\"}}",
    ...

2. Bug Details:

a) Architectures: 64-bit Intel EM64T/AMD64
  x86_64

b) Bugzilla Dependencies:

c) Drivers or hardware dependencies:

d) Upstream acceptance information: 

e) External links:

f) Severity (H,M,L):
  H

g) How reproducible: 
  Always

h) Steps to Reproduce: 

  Step1: Deploy EFK and EventRouter on OpenShift4.1 environment according to the following documents.

    EFK: https://docs.openshift.com/container-platform/4.1/logging/efk-logging-deploying.html
    EventRouter: https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html

  Step2: Call ElasticSearch API to check kubernetes events which are sent by EventRouter via fluentd

    $ oc create route passthrough --service=elasticsearch -n openshift-logging
    route.route.openshift.io/elasticsearch created

    $ URL=$(oc get route -n openshift-logging elasticsearch --no-headers | awk '{ print $2}')
    $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{
         "query": {
           "bool" : {
             "must" : [
                { "range" : { "@timestamp" : { "gte": "now-1h", "lt": "now", "time_zone": "+09:00" } } },
                { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }}
              ]
           }
         }
       }' | python -m json.tool

i) Actual results:

  ElasticSearch returns data which weren't transformed by fluentd

j) Expected results:

  ElasticSearch returns data which were transformed by fluentd

k) Additional information

  OpenShift version: OCP 4.1.17
  ElasticSearch Operator version: elasticsearch-operator.4.1.18-201909201915

  In RHOCP3.11, TRANSFORM_EVENTS is set to true in fluentd when eventrouter was deployed by ansible-playbook.
  On the other hand, in RHOCP4.1, TRANSFORM_EVENTS isn't set when deploying eventrouter.
  "process_kubernetes_events false" is set due to it.
  That's why fluentd doesn't transform kubernetes events in RHOCP4.1.

    $ oc exec -it fluentd-2k7tz -n openshift-logging -- grep -A 2 TRANSFORM_EVENTS run.sh
    # Disable process_kubernetes_events if TRANSFORM_EVENTS is false client.
    if [ "${TRANSFORM_EVENTS:-}" != true ] ; then
        sed -i 's/\(.*@type viaq_data_model.*\)/\1\n  process_kubernetes_events false/' $CFG_DIR/openshift/filter-viaq-data-model.conf
    fi

3. Business impact:

  Users cannot monitor kubernetes events easily due to this problem.
  EFK is a core component of OCP and it's very important to monitor OCP clusters.
  It should be fixed as soon as possible.

4. Primary contact at Red Hat, email, phone (chat)
  mfuruta

5. Primary contact at Partner, email, phone (chat)
  mas-hatada.nec.com (Masaki Hatada)

--- Additional comment from Jeff Cantrill on 2019-09-30 17:28:02 UTC ---

This is not a blocker for 4.2.  Pushing to 4.3 to get off the blocker list.  Likely any fix will need to be pulled back into 4.2

--- Additional comment from Rich Megginson on 2019-09-30 18:23:16 UTC ---

Looks like we need to add a field for transform_events to the clusterlogging CR.  We're not likely to do this in 4.2.

In the meantime, the workaround is the following.
WARNING WARNING WARNING - your logging deployment will be unmanaged and will not get any updates if you follow these directions!  So maybe the workaround is not for you.  If you want to proceed, read on.

oc edit clusterlogging instance

change the managementState from Managed to Unmanaged

then

oc set env ds/fluentd TRANSFORM_EVENTS=true

This will redeploy fluentd with the correct settings.  You should now see kubernetes events in Elasticsearch/Kibana.

--- Additional comment from Masaki Hatada on 2019-10-01 06:17:04 UTC ---

> In the meantime, the workaround is the following.
> WARNING WARNING WARNING - your logging deployment will be unmanaged and will not get any updates if you follow these directions!  So maybe the workaround is not for you.  If you want to proceed, read on.
> 
> oc edit clusterlogging instance
> 
> change the managementState from Managed to Unmanaged
> 
> then
> 
> oc set env ds/fluentd TRANSFORM_EVENTS=true
> 
> This will redeploy fluentd with the correct settings.  You should now see kubernetes events in Elasticsearch/Kibana.

Thank you for the information.
However, fluentd doesn't transform kubernetes events even after setting TRANSFORM_EVENTS=true.

I noticed that /etc/fluent/configs.d/openshift/filter-retag-journal.conf expects ^k8s_[^_]+_logging-eventrouter-[^_]+_ as the container name of kube-eventrouter.

  $ oc exec -it fluentd-2g67p -n openshift-logging cat /etc/fluent/configs.d/openshift/filter-retag-journal.conf | grep -B1 -A1 eventrouter
      key CONTAINER_NAME
      pattern ^k8s_[^_]+_logging-eventrouter-[^_]+_
      tag kubernetes.journal.container._default_.kubernetes-event

In RHOCP3.11, it corresponds to the container name of kube-eventrouter.

 $ ssh ocp311-master2 docker ps
  CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
  ...
  cc253662b503        fe6bf428d35d                                         "/bin/sh -c '/usr/..."   3 weeks ago         Up 3 weeks                              k8s_kube-eventrouter_logging-eventrouter-1-ntqg2_default_ce4d7b2d-c58f-11e9-8a67-fa163e80c746_1

On the other hand, it doesn't correspond to the container name of kube-eventrouter in RHOCP4.1.

  $ ssh core@rhocp410-worker2 sudo crictl ps
  CONTAINER ID        IMAGE                                                                                                                              CREATED             STATE               NAME                                 ATTEMPT             POD ID
  ac0e82117a2ad       ee6f719c73b9d4149866eec45676a538ee30a14cad83e1da86b6ed75e41f8c56                                                                   8 minutes ago       Running             fluentd                              0                   8b4c5a1c5f181
  2dd0478e341ca       registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052      About an hour ago   Running             kube-eventrouter                     0                   c285022fc8934

  $ ssh core@rhocp410-worker2 sudo crictl pods
  POD ID              CREATED             STATE               NAME                                                  NAMESPACE                                ATTEMPT
  ...
  c285022fc8934       40 minutes ago      Ready               cluster-logging-eventrouter-7b6c97b8b5-v9xtg          openshift-logging                        0

/etc/fluent/configs.d/openshift/filter-retag-journal.conf have to be updated for RHOCP4.1.

--- Additional comment from Rich Megginson on 2019-10-01 13:47:08 UTC ---

> /etc/fluent/configs.d/openshift/filter-retag-journal.conf have to be updated for RHOCP4.1.

We are not planning to fix this bug until OCP 4.3.

--- Additional comment from Rich Megginson on 2019-10-01 14:09:03 UTC ---

(In reply to Rich Megginson from comment #4)
> > /etc/fluent/configs.d/openshift/filter-retag-journal.conf have to be updated for RHOCP4.1.
> 
> We are not planning to fix this bug until OCP 4.3.

filter-retag-journal.conf _only_ applies to logs read from systemd journald, using the docker --log-driver=journald.

In 4.1, the only supported log driver is cri-o with the file log driver.

fluentd is expecting eventrouter logs to come from a pod named "logging-eventrouter-*" running in namespace "default".  How are you deploying the eventrouter in 4.1?

eventrouter records are processed here: https://github.com/ViaQ/fluent-plugin-viaq_data_model/blob/master/lib/fluent/plugin/filter_viaq_data_model.rb#L452

it is expected that the outcome of this processing is a field named "kubernetes.event.metadata.uid" which is the UUID of the event.

This UUID is used as the elasticsearch ID - https://github.com/openshift/origin-aggregated-logging/blob/master/fluentd/configs.d/openshift/filter-post-genid.conf#L6

but _only_ if the record tag matches "kubernetes.var.log.containers.logging-eventrouter-*.** kubernetes.journal.container._default_.kubernetes-event" - the first pattern corresponds to a cri-o log driver log for a pod name matching "logging-eventrouter-*" (that is, something like "logging-eventrouter-xyzyx" not "logging-eventrouter-xyzyx-something").  There is no namespace name part of that match - however - the code was designed for the eventrouter running in the "default" namespace, and there may be other hidden assumptions elsewhere in the code.  "default" is considered an "infra" log, like logs coming from other "openshift*" or "kube*" namespaces, and can only be seen by cluster admins.

Using the event UUID as the elasticsearch record ID gives you an easy way to correlate records in elasticsearch back to the exact source event.

If it is not possible anymore in 4.x to deploy the eventrouter with a specific name, or in the "default" namespace, then there is probably no easy remedy, and we will have to fix the code/configs to make it work in 4.x.

--- Additional comment from Masaki Hatada on 2019-10-02 01:41:49 UTC ---

> fluentd is expecting eventrouter logs to come from a pod named "logging-eventrouter-*" running in namespace "default". How are you deploying the eventrouter in 4.1?

I deployed eventrouter just according to https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html.
However, fluentd doesn't transform kubernetes events even if I deployed eventrouter on default namespace with pod name "logging-eventrouter-*".

  $ oc exec -it fluentd-6mn6j -n openshift-logging env | grep TRANS
  TRANSFORM_EVENTS=true

  $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{
  ...(snip)...
              {
                  "_id": "ZTk0YzViNDAtMGM2OC00M2E3LTliZjUtMTk5Y2I1Yzg3Nzk2",
                  "_index": ".operations.2019.10.02",
                  "_score": 2.6822762,
                  "_source": {
                      "@timestamp": "2019-10-02T01:28:51.929091+00:00",
                      "docker": {
                          "container_id": "c3e8b061aca97a6e0e82c73f6208b39fc2a00d1c168fe876f171f861d407e8e1"
                      },
                      "hostname": "rhocp410-worker1.rhocp410.cluster.sub.nec.test",
                      "kubernetes": {
                          "container_image": "registry.redhat.io/openshift4/ose-logging-eventrouter:latest",
                          "container_image_id": "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052",
                          "container_name": "kube-eventrouter",
                          "host": "rhocp410-worker1.rhocp410.cluster.sub.nec.test",
                          "labels": {
                              "component": "eventrouter",
                              "logging-infra": "eventrouter",
                              "pod-template-hash": "54d8598c85",
                              "provider": "openshift"
                          },
                          "master_url": "https://kubernetes.default.svc",
                          "namespace_id": "12ea31ba-9bcf-11e9-9994-52540086c9e7",
   ★namespace            "namespace_name": "default",
                          "pod_id": "e122c94b-e4a8-11e9-80f2-52540007a421",
   ★pod name             "pod_name": "logging-eventrouter-54d8598c85-njt2c"
                      },
                      "level": "info",
                      "message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-1.15c9b03f81118baf\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/namespaces/test/events/django-psql-example-1.15c9b03f81118baf\",\"uid\":\"fdc19753-e4b3-11e9-8b57-525400631ae9\",\"resourceVersion\":\"32864698\",\"creationTimestamp\":\"2019-10-02T01:28:51Z\"},\"involvedObject\":{\"kind\":\"ReplicationController\",\"namespace\":\"test\",\"name\":\"django-psql-example-1\",\"uid\":\"24909921-e3f7-11e9-949b-52540007a421\",\"apiVersion\":\"v1\",\"resourceVersion\":\"32864560\"},\"reason\":\"SuccessfulCreate\",\"message\":\"Created pod: django-psql-example-1-mppsh\",\"source\":{\"component\":\"replication-controller\"},\"firstTimestamp\":\"2019-10-02T01:28:51Z\",\"lastTimestamp\":\"2019-10-02T01:28:51Z\",\"count\":1,\"type\":\"Normal\"}}",
                      "pipeline_metadata": {
                          "collector": {
                              "inputname": "fluent-plugin-systemd",
                              "ipaddr4": "10.128.1.121",
                              "ipaddr6": "fe80::a8dc:e4ff:fe60:f2",
                              "name": "fluentd",
                              "received_at": "2019-10-02T01:28:52.649247+00:00",
                              "version": "1.5.1 1.6.0"
                          }
                      },
                      "viaq_msg_id": "ZTk0YzViNDAtMGM2OC00M2E3LTliZjUtMTk5Y2I1Yzg3Nzk2"
                  },
                  "_type": "com.redhat.viaq.common"
              }
          ],
          "max_score": 2.784227,
          "total": 22
      },
      "timed_out": false,
      "took": 499

> We are not planning to fix this bug until OCP 4.3.

Eventrouter is one of important features for monitoring OCP cluster,
and our customer, who is planning to migrate their system from OCP3.11 to OCP4.2, is using the feature.

It looks like a regression for the customer.
So we would like Red Hat to fix this problem in OCP4.2 at least.

--- Additional comment from Rich Megginson on 2019-10-02 02:14:45 UTC ---

I think the problem is that, in order to use the eventrouter, you must use MERGE_JSON_LOG=true.  In 3.11, this was true by default, and it caused many, many problems.  So in 4.x, we changed the default to MERGE_JSON_LOG=false.  See 
https://docs.openshift.com/container-platform/4.1/logging/config/efk-logging-fluentd.html#efk-logging-fluentd-json_efk-logging-fluentd

Please also file a doc bug because https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is not correct - it has the wrong pod name and namespace, and perhaps other problems.

--- Additional comment from Masaki Hatada on 2019-10-02 04:16:03 UTC ---

> I think the problem is that, in order to use the eventrouter, you must use MERGE_JSON_LOG=true.  In 3.11, this was true by default, and it caused many, many problems.  So in 4.x, we changed the default to MERGE_JSON_LOG=false.

Thank you for the information.
After I set MERGE_JSON_LOG=true and TRANSFORM_EVENTS=true, fluentd became able to transform kubernetes events.

If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix fluentd so that it can transform kubernetes events without the parameter?

> Please also file a doc bug because https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is not correct - it has the wrong pod name and namespace, and perhaps other problems.

It seems that fluentd can transform kubernetes events even if eventrouter was deployed on "openshift-logging" namespace with a pod name "cluster-logging-eventrouter-*".
Should I still change its namespace and pod name?

I pasted the terminal log below.

  $ oc exec -it fluentd-ncn4n -n openshift-logging env | grep -e MERGE_JSON_LOG -e TRANSFORM_EVENTS
  MERGE_JSON_LOG=true
  TRANSFORM_EVENTS=true
  
  $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{
            "query": {
              "bool" : {
                "must" : [
                   { "range" : { "@timestamp" : { "gte": "now-5m", "lt": "now", "time_zone": "+09:00" } } },
                   { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }}
                 ]
              }
            }
          }' | python -m json.tool
  {
  ...

                      "kubernetes": {
                          "container_image": "registry.redhat.io/openshift4/ose-logging-eventrouter:latest",
                          "container_image_id": "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052",
                          "container_name": "kube-eventrouter",
                          "event": {
                              "count": 1,
                              "firstTimestamp": "2019-10-02T03:51:03Z",
                              "involvedObject": {
                                  "apiVersion": "v1",
                                  "fieldPath": "spec.containers{django-psql-example}",
                                  "kind": "Pod",
                                  "name": "django-psql-example-1-g2hxs",
                                  "namespace": "test",
                                  "resourceVersion": "32902882",
                                  "uid": "b424a806-e4c4-11e9-8b57-525400631ae9"
                              },
                              "lastTimestamp": "2019-10-02T03:51:03Z",
                              "metadata": {
                                  "name": "django-psql-example-1-g2hxs.15c9b801f4b62577",
                                  "namespace": "test",
                                  "resourceVersion": "32910091",
                                  "selfLink": "/api/v1/namespaces/test/events/django-psql-example-1-g2hxs.15c9b801f4b62577",
                                  "uid": "db129fc2-e4c7-11e9-ba59-525400f9452d"
                              },
                              "reason": "Unhealthy",
                              "source": {
                                  "component": "kubelet",
                                  "host": "rhocp410-worker1.rhocp410.cluster.sub.nec.test"
                              },
                              "type": "Warning",
                              "verb": "ADDED"
                          },
                          "host": "rhocp410-worker1.rhocp410.cluster.sub.nec.test",
                          "labels": {
                              "component": "eventrouter",
                              "logging-infra": "eventrouter",
                              "pod-template-hash": "7b6c97b8b5",
                              "provider": "openshift"
                          },
                          "master_url": "https://kubernetes.default.svc",
                          "namespace_id": "8aa7ba9d-df50-11e9-ba89-52540007a421",
                          "namespace_labels": {
                              "openshift_io/cluster-logging": "true",
                              "openshift_io/cluster-monitoring": "true"
                          },
                          "namespace_name": "openshift-logging",
                          "pod_id": "140ad1e1-e4c2-11e9-8b57-525400631ae9",
                          "pod_name": "cluster-logging-eventrouter-7b6c97b8b5-cdkbq"
                      },
                      "level": "info",
                      "message": "Liveness probe failed: Get http://10.128.1.157:8080/health: read tcp 10.128.0.1:33156->10.128.1.157:8080: read: connection reset by peer",
                      "pipeline_metadata": {
                          "collector": {
                              "inputname": "fluent-plugin-systemd",
                              "ipaddr4": "10.128.1.158",
                              "ipaddr6": "fe80::8a8:81ff:fe9c:d0c",
                              "name": "fluentd",
                              "original_raw_message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-1-g2hxs.15c9b801f4b62577\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/namespaces/test/events/django-psql-example-1-g2hxs.15c9b801f4b62577\",\"uid\":\"db129fc2-e4c7-11e9-ba59-525400f9452d\",\"resourceVersion\":\"32910091\",\"creationTimestamp\":\"2019-10-02T03:51:03Z\"},\"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"test\",\"name\":\"django-psql-example-1-g2hxs\",\"uid\":\"b424a806-e4c4-11e9-8b57-525400631ae9\",\"apiVersion\":\"v1\",\"resourceVersion\":\"32902882\",\"fieldPath\":\"spec.containers{django-psql-example}\"},\"reason\":\"Unhealthy\",\"message\":\"Liveness probe failed: Get http://10.128.1.157:8080/health: read tcp 10.128.0.1:33156-\\u003e10.128.1.157:8080: read: connection reset by peer\",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-worker1.rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-10-02T03:51:03Z\",\"lastTimestamp\":\"2019-10-02T03:51:03Z\",\"count\":1,\"type\":\"Warning\"}}",
                              "received_at": "2019-10-02T03:51:03.759610+00:00",
                              "version": "1.5.1 1.6.0"
                          }
                      },
                      "viaq_msg_id": "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4"
                  },
                  "_type": "com.redhat.viaq.common"
              }

--- Additional comment from Masaki Hatada on 2019-10-02 07:33:38 UTC ---

Hi Furuta-san,

I would like to know what information I have to provide.

--- Additional comment from Rich Megginson on 2019-10-02 12:56:25 UTC ---

(In reply to Masaki Hatada from comment #8)
> > I think the problem is that, in order to use the eventrouter, you must use MERGE_JSON_LOG=true.  In 3.11, this was true by default, and it caused many, many problems.  So in 4.x, we changed the default to MERGE_JSON_LOG=false.
> 
> Thank you for the information.
> After I set MERGE_JSON_LOG=true and TRANSFORM_EVENTS=true, fluentd became
> able to transform kubernetes events.
> 
> If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix
> fluentd so that it can transform kubernetes events without the parameter?

We don't recommend it to be used by default for all customers.  However, if you need to use it, and it works for you in your use case, and doesn't cause any problems, then please continue to use it.

> 
> > Please also file a doc bug because https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is not correct - it has the wrong pod name and namespace, and perhaps other problems.
> 
> It seems that fluentd can transform kubernetes events even if eventrouter
> was deployed on "openshift-logging" namespace with a pod name
> "cluster-logging-eventrouter-*".

Yes, it "works" but this:

>                       "viaq_msg_id":
> "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4"

That is, it is using an auto-generated random value for the Elasticsearch document ID rather than the UUID from the record - kubernetes.event.metadata.uid "db129fc2-e4c7-11e9-ba59-525400f9452d"

If this "works" for you, that's fine. 

> Should I still change its namespace and pod name?
> 
> I pasted the terminal log below.
> 
>   $ oc exec -it fluentd-ncn4n -n openshift-logging env | grep -e
> MERGE_JSON_LOG -e TRANSFORM_EVENTS
>   MERGE_JSON_LOG=true
>   TRANSFORM_EVENTS=true
>   
>   $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H
> "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d
> '{
>             "query": {
>               "bool" : {
>                 "must" : [
>                    { "range" : { "@timestamp" : { "gte": "now-5m", "lt":
> "now", "time_zone": "+09:00" } } },
>                    { "match_phrase" : { "kubernetes.container_name":
> "kube-eventrouter" }}
>                  ]
>               }
>             }
>           }' | python -m json.tool
>   {
>   ...
> 
>                       "kubernetes": {
>                           "container_image":
> "registry.redhat.io/openshift4/ose-logging-eventrouter:latest",
>                           "container_image_id":
> "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:
> ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052",
>                           "container_name": "kube-eventrouter",
>                           "event": {
>                               "count": 1,
>                               "firstTimestamp": "2019-10-02T03:51:03Z",
>                               "involvedObject": {
>                                   "apiVersion": "v1",
>                                   "fieldPath":
> "spec.containers{django-psql-example}",
>                                   "kind": "Pod",
>                                   "name": "django-psql-example-1-g2hxs",
>                                   "namespace": "test",
>                                   "resourceVersion": "32902882",
>                                   "uid":
> "b424a806-e4c4-11e9-8b57-525400631ae9"
>                               },
>                               "lastTimestamp": "2019-10-02T03:51:03Z",
>                               "metadata": {
>                                   "name":
> "django-psql-example-1-g2hxs.15c9b801f4b62577",
>                                   "namespace": "test",
>                                   "resourceVersion": "32910091",
>                                   "selfLink":
> "/api/v1/namespaces/test/events/django-psql-example-1-g2hxs.
> 15c9b801f4b62577",
>                                   "uid":
> "db129fc2-e4c7-11e9-ba59-525400f9452d"
>                               },
>                               "reason": "Unhealthy",
>                               "source": {
>                                   "component": "kubelet",
>                                   "host":
> "rhocp410-worker1.rhocp410.cluster.sub.nec.test"
>                               },
>                               "type": "Warning",
>                               "verb": "ADDED"
>                           },
>                           "host":
> "rhocp410-worker1.rhocp410.cluster.sub.nec.test",
>                           "labels": {
>                               "component": "eventrouter",
>                               "logging-infra": "eventrouter",
>                               "pod-template-hash": "7b6c97b8b5",
>                               "provider": "openshift"
>                           },
>                           "master_url": "https://kubernetes.default.svc",
>                           "namespace_id":
> "8aa7ba9d-df50-11e9-ba89-52540007a421",
>                           "namespace_labels": {
>                               "openshift_io/cluster-logging": "true",
>                               "openshift_io/cluster-monitoring": "true"
>                           },
>                           "namespace_name": "openshift-logging",
>                           "pod_id": "140ad1e1-e4c2-11e9-8b57-525400631ae9",
>                           "pod_name":
> "cluster-logging-eventrouter-7b6c97b8b5-cdkbq"
>                       },
>                       "level": "info",
>                       "message": "Liveness probe failed: Get
> http://10.128.1.157:8080/health: read tcp
> 10.128.0.1:33156->10.128.1.157:8080: read: connection reset by peer",
>                       "pipeline_metadata": {
>                           "collector": {
>                               "inputname": "fluent-plugin-systemd",
>                               "ipaddr4": "10.128.1.158",
>                               "ipaddr6": "fe80::8a8:81ff:fe9c:d0c",
>                               "name": "fluentd",
>                               "original_raw_message":
> "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-
> 1-g2hxs.15c9b801f4b62577\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/
> namespaces/test/events/django-psql-example-1-g2hxs.15c9b801f4b62577\",
> \"uid\":\"db129fc2-e4c7-11e9-ba59-525400f9452d\",\"resourceVersion\":
> \"32910091\",\"creationTimestamp\":\"2019-10-02T03:51:03Z\"},
> \"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"test\",\"name\":
> \"django-psql-example-1-g2hxs\",\"uid\":\"b424a806-e4c4-11e9-8b57-
> 525400631ae9\",\"apiVersion\":\"v1\",\"resourceVersion\":\"32902882\",
> \"fieldPath\":\"spec.containers{django-psql-example}\"},\"reason\":
> \"Unhealthy\",\"message\":\"Liveness probe failed: Get
> http://10.128.1.157:8080/health: read tcp
> 10.128.0.1:33156-\\u003e10.128.1.157:8080: read: connection reset by
> peer\",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-worker1.
> rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-10-02T03:51:03Z\",
> \"lastTimestamp\":\"2019-10-02T03:51:03Z\",\"count\":1,\"type\":
> \"Warning\"}}",
>                               "received_at":
> "2019-10-02T03:51:03.759610+00:00",
>                               "version": "1.5.1 1.6.0"
>                           }
>                       },
>                       "viaq_msg_id":
> "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4"
>                   },
>                   "_type": "com.redhat.viaq.common"
>               }

--- Additional comment from Masaki Hatada on 2019-10-03 09:53:10 UTC ---

>> If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix
>> fluentd so that it can transform kubernetes events without the parameter?
>
> We don't recommend it to be used by default for all customers.
> However, if you need to use it, and it works for you in your use case, and doesn't cause any problems, then please continue to use it.

Although I know you doesn't recommend setting MERGE_JSON_LOG=true, EFK cannot handle kubernetes events correctly without this parameter.
Doesn't Red Hat have a plan to fix fluentd so that it becomes able to handle kubernetes events without this parameter?

>>                       "viaq_msg_id":
>> "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4"
>
>That is, it is using an auto-generated random value for the Elasticsearch document ID rather than the UUID from the record - kubernetes.event.metadata.uid "db129fc2-e4c7-11e9-ba59-525400f9452d"
>
>If this "works" for you, that's fine. 

I verified that viaq_msg_id is set to kubernetes.event.metadata.uid if eventrouter was deployed on default namespace with the pod name "logging-eventrouter".
So I hope that https://docs.openshift.com/container-platform/4.1/logging/efk-logging-eventrouter.html is updated.

--- Additional comment from Rich Megginson on 2019-10-03 15:28:07 UTC ---

(In reply to Masaki Hatada from comment #11)
> >> If Red Hat does not recommend to set MERGE_JSON_LOG=true, could Red Hat fix
> >> fluentd so that it can transform kubernetes events without the parameter?
> >
> > We don't recommend it to be used by default for all customers.
> > However, if you need to use it, and it works for you in your use case, and doesn't cause any problems, then please continue to use it.
> 
> Although I know you doesn't recommend setting MERGE_JSON_LOG=true, EFK
> cannot handle kubernetes events correctly without this parameter.
> Doesn't Red Hat have a plan to fix fluentd so that it becomes able to handle
> kubernetes events without this parameter?

The plan is the fix for this bz.  This bz is targeted for 4.3, where it will be triaged and prioritized along with all of the other bugs for 4.3.  Since there is a workaround, I don't think this bug needs to be urgently fixed in 4.2 or 4.1.

> 
> >>                       "viaq_msg_id":
> >> "MThmMjIxNDItNDgyZi00YmRmLWIwN2ItZWNkMDM0NzYyYmQ4"
> >
> >That is, it is using an auto-generated random value for the Elasticsearch document ID rather than the UUID from the record - kubernetes.event.metadata.uid "db129fc2-e4c7-11e9-ba59-525400f9452d"
> >
> >If this "works" for you, that's fine. 
> 
> I verified that viaq_msg_id is set to kubernetes.event.metadata.uid if
> eventrouter was deployed on default namespace with the pod name
> "logging-eventrouter".
> So I hope that
> https://docs.openshift.com/container-platform/4.1/logging/efk-logging-
> eventrouter.html is updated.

https://bugzilla.redhat.com/show_bug.cgi?id=1758222 - please add anything to that BZ that you feel I may have missed

--- Additional comment from Masaki Hatada on 2019-10-04 00:31:33 UTC ---

>> Although I know you doesn't recommend setting MERGE_JSON_LOG=true, EFK
>> cannot handle kubernetes events correctly without this parameter.
>> Doesn't Red Hat have a plan to fix fluentd so that it becomes able to handle
>> kubernetes events without this parameter?
>
>The plan is the fix for this bz.  This bz is targeted for 4.3, where it will be triaged and prioritized along with all of the other bugs for 4.3.  Since there is a workaround, I don't think this bug needs to be urgently fixed in 4.2 or 4.1.

I understood that.
I'll text here if I received any big business impact from customer about this issue.

>> I verified that viaq_msg_id is set to kubernetes.event.metadata.uid if
>> eventrouter was deployed on default namespace with the pod name
>> "logging-eventrouter".
>> So I hope that
>> https://docs.openshift.com/container-platform/4.1/logging/efk-logging-
>> eventrouter.html is updated.
>
>https://bugzilla.redhat.com/show_bug.cgi?id=1758222 - please add anything to that BZ that you feel I may have missed

Thank you for creating a new Bugzilla.

Although it hasn't mentioned changing namespace from "openshift-logging" to "default", I believe it's not needed because viaq_msg_id is set to kubernetes.event.metadata.uid even when logging-eventrouter pod is running on "openshift-logging" namespace.

  $ curl -s -k -H "Authorization: Bearer $(oc whoami -t)" -H "X-Proxy-Remote-User: $(oc whoami)" "https://$URL/.operations.*/_search" -d '{
              "query": {
                "bool" : {
                  "must" : [
                     { "range" : { "@timestamp" : { "gte": "now-15m", "lt": "now", "time_zone": "+09:00" } } },
                     { "match_phrase" : { "kubernetes.container_name": "kube-eventrouter" }}
                   ]
                }
              }
            }' | python -m json.tool
  ...
              {
                  "_id": "bd035e15-e63a-11e9-8f60-525400631ae9",
                  "_index": ".operations.2019.10.04",
                  "_score": 3.502013,
                  "_source": {
                      "@timestamp": "2019-10-04T00:05:56Z",
                      "docker": {
                          "container_id": "4aca9d9a8cbb2f6f8cd057075064c745c684f81b4b1fbcc27e8856733fee8e36"
                      },
                      "hostname": "rhocp410-worker2.rhocp410.cluster.sub.nec.test",
                      "kubernetes": {
                          "container_image": "registry.redhat.io/openshift4/ose-logging-eventrouter:latest",
                          "container_image_id": "registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:ffacf9a0d5931752708abeee2ceaedb6a80f2f5ec5294fe9af7d85f859ab7052",
                          "container_name": "kube-eventrouter",
                          "event": {
                              "count": 3,
                              "firstTimestamp": "2019-10-04T00:05:56Z",
                              "involvedObject": {
                                  "apiVersion": "v1",
                                  "fieldPath": "spec.containers{django-psql-example}",
                                  "kind": "Pod",
                                  "name": "django-psql-example-1-74gd6",
                                  "namespace": "test",
                                  "resourceVersion": "33746648",
                                  "uid": "b7e3e9fb-e63a-11e9-8f60-525400631ae9"
                              },
                              "lastTimestamp": "2019-10-04T00:06:28Z",
                              "metadata": {
                                  "name": "django-psql-example-1-74gd6.15ca48e236bc9e32",
                                  "namespace": "test",
                                  "resourceVersion": "33746880",
                                  "selfLink": "/api/v1/namespaces/test/events/django-psql-example-1-74gd6.15ca48e236bc9e32",
                                  "uid": "bd035e15-e63a-11e9-8f60-525400631ae9"
                              },
                              "reason": "Created",
                              "source": {
                                  "component": "kubelet",
                                  "host": "rhocp410-worker2.rhocp410.cluster.sub.nec.test"
                              },
                              "type": "Normal",
                              "verb": "ADDED"
                          },
                          "host": "rhocp410-worker2.rhocp410.cluster.sub.nec.test",
                          "labels": {
                              "component": "eventrouter",
                              "logging-infra": "eventrouter",
                              "pod-template-hash": "54d8598c85",
                              "provider": "openshift"
                          },
                          "master_url": "https://kubernetes.default.svc",
                          "namespace_id": "8aa7ba9d-df50-11e9-ba89-52540007a421",
                          "namespace_labels": {
                              "openshift_io/cluster-logging": "true",
                              "openshift_io/cluster-monitoring": "true"
                          },
              ★          "namespace_name": "openshift-logging",
                          "pod_id": "fb6e289f-e63a-11e9-8f60-525400631ae9",
              ★          "pod_name": "logging-eventrouter-54d8598c85-f7tcn"
                      },
                      "level": "info",
                      "message": "Created container django-psql-example",
                      "pipeline_metadata": {
                          "collector": {
                              "inputname": "fluent-plugin-systemd",
                              "ipaddr4": "10.129.1.160",
                              "ipaddr6": "fe80::6864:91ff:fe96:a6b5",
                              "name": "fluentd",
                              "original_raw_message": "{\"verb\":\"ADDED\",\"event\":{\"metadata\":{\"name\":\"django-psql-example-1-74gd6.15ca48e236bc9e32\",\"namespace\":\"test\",\"selfLink\":\"/api/v1/namespaces/test/events/django-psql-example-1-74gd6.15ca48e236bc9e32\",\"uid\":\"bd035e15-e63a-11e9-8f60-525400631ae9\",\"resourceVersion\":\"33746880\",\"creationTimestamp\":\"2019-10-04T00:05:56Z\"},\"involvedObject\":{\"kind\":\"Pod\",\"namespace\":\"test\",\"name\":\"django-psql-example-1-74gd6\",\"uid\":\"b7e3e9fb-e63a-11e9-8f60-525400631ae9\",\"apiVersion\":\"v1\",\"resourceVersion\":\"33746648\",\"fieldPath\":\"spec.containers{django-psql-example}\"},\"reason\":\"Created\",\"message\":\"Created container django-psql-example\",\"source\":{\"component\":\"kubelet\",\"host\":\"rhocp410-worker2.rhocp410.cluster.sub.nec.test\"},\"firstTimestamp\":\"2019-10-04T00:05:56Z\",\"lastTimestamp\":\"2019-10-04T00:06:28Z\",\"count\":3,\"type\":\"Normal\"}}",
                              "received_at": "2019-10-04T00:08:38.171034+00:00",
                              "version": "1.5.1 1.6.0"
                          }
                      },
                      "viaq_msg_id": "bd035e15-e63a-11e9-8f60-525400631ae9"
                  },
                  "_type": "com.redhat.viaq.common"
              }
          ],
          "max_score": 3.502013,
          "total": 138
      },
      "timed_out": false,
      "took": 493
  }

--- Additional comment from Rich Megginson on 2019-10-04 17:21:58 UTC ---

> Flags: needinfo?(rmeggins)

What info do you need?

--- Additional comment from Masaki Hatada on 2019-10-08 02:15:39 UTC ---

> What info do you need?

Sorry, nothing in particular.

Comment 3 Anping Li 2019-12-03 02:41:34 UTC
Verified in 4.2.9. The eventrouter are gather and transform to json

Comment 5 errata-xmlrpc 2019-12-11 22:36:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4093


Note You need to log in before you can comment on or make changes to this bug.