Bug 1441765

Summary: OpenShift provider event storm POD_FAILEDSYNC
Product: Red Hat CloudForms Management Engine Reporter: Adam Grare <agrare>
Component: ProvidersAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED CURRENTRELEASE QA Contact: Einat Pacifici <epacific>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7.0CC: agrare, fsimonce, izapolsk, jfrey, jhardy, obarenbo, simaishi
Target Milestone: GAKeywords: TestOnly
Target Release: 5.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.9.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1441854 1441855 (view as bug list) Environment:
Last Closed: 2018-03-06 14:49:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1441854, 1441855    

Description Adam Grare 2017-04-12 16:17:30 UTC
The FailedSync event coming from the openshift provider is being delivered extremely frequently leading to approximately 11 million records in the EventStream table, ballooning that table to 35GiB.

These events should be ignored by CFME as they do not provide any actionable information.

Comment 2 Adam Grare 2017-04-12 16:18:42 UTC
https://github.com/ManageIQ/manageiq/pull/14633

Comment 5 Pavel Zagalsky 2017-06-11 11:20:16 UTC
Need more info on how to simulate such event (FailedSync)

Comment 6 Adam Grare 2017-06-11 15:43:33 UTC
Pavel, it is my understanding that these events normally are sent by kubernetes and shouldn't need to be simulated.

Maybe Federico can assist with how to get these events sent?

Comment 7 Federico Simoncelli 2017-10-10 07:57:15 UTC
I couldn't find a real occurrence of the event on my clusters.

This is what I could recreate by looking at other events and the kubernetes FailedSync code:

# oc create -f - <<EOF
apiVersion: v1
count: 1 
firstTimestamp: 2017-10-09T08:00:00Z
involvedObject:
  apiVersion: v1
  kind: Pod
  name: failed-sync-test-1
  namespace: default
  resourceVersion: "1000"
  uid: e4dcb3cc-1fe6-4e12-bad3-ce90699076b1
kind: Event
lastTimestamp: 2017-10-09T08:00:00Z
message: 'Error syncing pod'
metadata:
  generateName: failed-sync-test-1-
  namespace: default
reason: FailedSync
source:
  component: kubelet
  host: failed-sync-test-node-1
EOF


Except of few fields (timestamp, names, etc.) this should enough.
Given that what we're testing here is that the event is ignored based on the reason we probably don't care to strive for correctness of the other fields (e.g. name of the pod, name of the node, etc.)

Anyway, Scott can you check if you have somewhere a real occurrence of FailedSync to refine the command above?

Comment 8 Scott Weiss 2017-10-10 18:15:04 UTC
yaml output of a real FailedSync event:

apiVersion: v1
count: 1
firstTimestamp: 2017-10-10T18:03:04Z
involvedObject:
  apiVersion: v1
  kind: Pod
  name: nginx-pod-1-deploy
  namespace: default
  resourceVersion: "476119"
  uid: eac7681d-ade4-11e7-8b4f-001a4a16264a
kind: Event
lastTimestamp: 2017-10-10T18:03:04Z
message: Error syncing pod
metadata:
  creationTimestamp: 2017-10-10T18:03:04Z
  name: nginx-pod-1-deploy.14ec477eb4561e4a
  namespace: default
  resourceVersion: "476269"
  selfLink: /api/v1/namespaces/default/events/nginx-pod-1-deploy.14ec477eb4561e4a
  uid: 43645658-ade5-11e7-8b4f-001a4a16264a
reason: FailedSync
source:
  component: kubelet
  host: ocp-compute02.10.35.48.78.nip.io
type: Warning