Bug 1441765 - OpenShift provider event storm POD_FAILEDSYNC
Summary: OpenShift provider event storm POD_FAILEDSYNC
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: GA
: 5.9.0
Assignee: Federico Simoncelli
QA Contact: Einat Pacifici
URL:
Whiteboard:
Depends On:
Blocks: 1441854 1441855
TreeView+ depends on / blocked
 
Reported: 2017-04-12 16:17 UTC by Adam Grare
Modified: 2020-06-11 13:35 UTC (History)
7 users (show)

Fixed In Version: 5.9.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1441854 1441855 (view as bug list)
Environment:
Last Closed: 2018-03-06 14:49:01 UTC
Category: ---
Cloudforms Team: Container Management
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Adam Grare 2017-04-12 16:17:30 UTC
The FailedSync event coming from the openshift provider is being delivered extremely frequently leading to approximately 11 million records in the EventStream table, ballooning that table to 35GiB.

These events should be ignored by CFME as they do not provide any actionable information.

Comment 2 Adam Grare 2017-04-12 16:18:42 UTC
https://github.com/ManageIQ/manageiq/pull/14633

Comment 5 Pavel Zagalsky 2017-06-11 11:20:16 UTC
Need more info on how to simulate such event (FailedSync)

Comment 6 Adam Grare 2017-06-11 15:43:33 UTC
Pavel, it is my understanding that these events normally are sent by kubernetes and shouldn't need to be simulated.

Maybe Federico can assist with how to get these events sent?

Comment 7 Federico Simoncelli 2017-10-10 07:57:15 UTC
I couldn't find a real occurrence of the event on my clusters.

This is what I could recreate by looking at other events and the kubernetes FailedSync code:

# oc create -f - <<EOF
apiVersion: v1
count: 1 
firstTimestamp: 2017-10-09T08:00:00Z
involvedObject:
  apiVersion: v1
  kind: Pod
  name: failed-sync-test-1
  namespace: default
  resourceVersion: "1000"
  uid: e4dcb3cc-1fe6-4e12-bad3-ce90699076b1
kind: Event
lastTimestamp: 2017-10-09T08:00:00Z
message: 'Error syncing pod'
metadata:
  generateName: failed-sync-test-1-
  namespace: default
reason: FailedSync
source:
  component: kubelet
  host: failed-sync-test-node-1
EOF


Except of few fields (timestamp, names, etc.) this should enough.
Given that what we're testing here is that the event is ignored based on the reason we probably don't care to strive for correctness of the other fields (e.g. name of the pod, name of the node, etc.)

Anyway, Scott can you check if you have somewhere a real occurrence of FailedSync to refine the command above?

Comment 8 Scott Weiss 2017-10-10 18:15:04 UTC
yaml output of a real FailedSync event:

apiVersion: v1
count: 1
firstTimestamp: 2017-10-10T18:03:04Z
involvedObject:
  apiVersion: v1
  kind: Pod
  name: nginx-pod-1-deploy
  namespace: default
  resourceVersion: "476119"
  uid: eac7681d-ade4-11e7-8b4f-001a4a16264a
kind: Event
lastTimestamp: 2017-10-10T18:03:04Z
message: Error syncing pod
metadata:
  creationTimestamp: 2017-10-10T18:03:04Z
  name: nginx-pod-1-deploy.14ec477eb4561e4a
  namespace: default
  resourceVersion: "476269"
  selfLink: /api/v1/namespaces/default/events/nginx-pod-1-deploy.14ec477eb4561e4a
  uid: 43645658-ade5-11e7-8b4f-001a4a16264a
reason: FailedSync
source:
  component: kubelet
  host: ocp-compute02.10.35.48.78.nip.io
type: Warning


Note You need to log in before you can comment on or make changes to this bug.