Hide Forgot
Description of problem: Eventrouter produces duplicated events every 30 min with verb UPDATE. Looking at the log, there is no diff between produced "event" and "old_event" contents. Same event appears multiple times with different timestamps in the ElasticSearch and it's confusing. > 2018-08-06T08:08:02.641161894Z {"verb":"UPDATED","event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"},"old_event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"}} > 2018-08-06T08:38:02.639198331Z {"verb":"UPDATED","event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"},"old_event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"}} > 2018-08-06T09:08:02.640141857Z {"verb":"UPDATED","event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"},"old_event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"}} > 2018-08-06T09:38:02.640618367Z {"verb":"UPDATED","event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"},"old_event":{"metadata":{"name":"nginx-host-2.15483c87846c7c08","namespace":"test-nginx","selfLink":"/api/v1/namespaces/test-nginx/events/nginx-host-2.15483c87846c7c08","uid":"4fb640d6-994e-11e8-a346-001a4a1601d0","resourceVersion":"49254008","creationTimestamp":"2018-08-06T07:57:05Z"},"involvedObject":{"kind":"ReplicationController","namespace":"test-nginx","name":"nginx-host-2","uid":"4c837dab-994e-11e8-a346-001a4a1601d0","apiVersion":"v1","resourceVersion":"49254003"},"reason":"SuccessfulCreate","message":"Created pod: nginx-host-2-hg8sl","source":{"component":"replication-controller"},"firstTimestamp":"2018-08-06T07:57:05Z","lastTimestamp":"2018-08-06T07:57:05Z","count":1,"type":"Normal"}} "oc get events -w" won't produce duplicated events while eventrouter is producing duplicated events, so it's likely some kind of filtering is missing in the eventrouter side. To sum up: - "oc get events" produces event A message 1 time - The event A message appears multiple times every 30 min for a while, both in ElasticSearch and eventrouter logs This is not a known duplicate messages issue when eventrouter is restarted. In this test, there is no restart of the eventrouter. Version-Release number of selected component (if applicable): v3.9.33 How reproducible: Always Steps to Reproduce: 1. Setup some deployments 2. Monitor "oc get events -o yaml --all-namespaces -w" and "oc get logs --timestapms logging-eventrouter" 3. find duplicate logs in logging-eventrouter log Actual results: - "oc get events" produces event A message 1 time - The event A message appears multiple times every 30 min for a while, both in ElasticSearch and eventrouter logs Expected results: - Each event message appears once in "oc get event", eventrouter logs and ElasticSearch. Additional info:
It could be related to "resync-interval" option by default 30m. Testing it with 2h now.
It seems to be the duplicated events are related to resync-interval 30m. With 2h I don't see duplicated events. We can check if there's any duplicated events by this command: > oc logs logging-eventrouter-XXXXX | sort | uniq -d Now the question is, is this appropriate default value? Should it be synced with event lifetime, by default 2h? Do we need to set shorter value than 2h for watch timeout and/or API server load balancer timeout? If the duplicated events are the current limitation, we should put it in the known issues section.
Also eventrouter currently produces duplicated events on "watch is too old" retries: > 1 reflector.go:323] github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:69: watch of *v1.Event ended with: The resourceVersion for the provided watch is too old We need to fix eventrouter not to produce duplicated events on watch retries, or document the current behavior (produces a lot of duplicate events every watch retries, every 30m and/or "watch is too old").
Noriko, Do you know if the bump in event router version resolves the issues here?
This [1] PR introduced a unique id to remove duplicates of all log messages. We could resolve specifically with the eventrouter if we instead used the uid produced by kubernetes as the uid indexed into Elasticsearch. Duplicates would then be ignored. This would require changes to fluent to handle an 'event' pipeline differently. [1] https://github.com/openshift/origin-aggregated-logging/pull/974/files
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/a995992b85ce4fd53e1ea7354ebd95a7dd992174 Bug 1613722 - Eventrouter creates duplicated events every 30 min with verb UPDATE Adding a plugin filter_elasticsearch_genid_ext.rb, which extends filter_ elasticsearch_genid.rb from fluent-plugin-elasticsearch. The extended plugin takes config parameter alt_key as shown in filter-post-genid.conf. <filter **> @type elasticsearch_genid_ext hash_id_key viaq_msg_id alt_key kubernetes.event.metadata.uid alt_tags "#{ENV['GENID_ALT_TAG'] || 'kubernetes.var.log.containers.kube-eventrouter-*.** kubernetes.journal.container._default_.kubernetes.event'}" </filter> The plugin does - 1. If the tag matches the alt_tags list or not, 2. If it does, it checks the alt_key exists in the record hash as the key. As the key is kubernetes.event.metadata.uid, it checks record[kubernetes][event][metadata][uid] exists or not. 3. If it exists, viaq_msg_id is set to the value. 4. Otherwise, the generated hash value is set. This is the default behaviour. Note: the value of kubernetes.event.metadata.uid is shared among the event logs originated from the same one event. By using the value for the elasticsearch primary key, duplicated logs are not to be indexed in the elasticsearch. For testing, code to check there is no duplicated value of kubernetes.event. metadata.uid is added to test/eventrouter.sh. Additionally, the tag of the event logs via journald used to be kubernetes.journal.container._kube-eventrouter_ which is changed to kubernetes.journal.container._default_.kubernetes.event https://github.com/openshift/origin-aggregated-logging/commit/e3a7e7aef90f13a7ecad1ba1ab8caef5e4bb376a Merge pull request #1424 from nhosoi/bz1613722 Bug 1613722 - Eventrouter creates duplicated events every 30 min with…
Additional PR. https://github.com/openshift/origin-aggregated-logging/pull/1434
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/51b68c15eed1c6637d4ec998cf566411775212fc Bug 1613722 - Eventrouter creates duplicated events every 30 min with verb UPDATE Replacing the eventrouter journald tag kubernetes.journal.container._default_.kubernetes.event with kubernetes.journal.container._default_.kubernetes-event Plus bug fixes. https://github.com/openshift/origin-aggregated-logging/commit/a36d32a554f695959e6f6ab47cf509922eec6935 Merge pull request #1434 from nhosoi/bz1613722 Bug 1613722 - Eventrouter creates duplicated events every 30 min with…
https://github.com/openshift/origin-aggregated-logging/pull/1439 [3.11] for https://github.com/openshift/origin-aggregated-logging/pull/1434
Tested on ose-logging-fluentd-v3.11.36-1, waited for about 2 hours, no duplicated events created, so move this bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537