1489616 – Cannot see OpenShift events

Bug 1489616 - Cannot see OpenShift events

Summary: Cannot see OpenShift events

Keywords:
Status:	CLOSED DUPLICATE of bug 1367114
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	GA
Target Release:	5.8.2
Assignee:	Beni Paskin-Cherniavsky
QA Contact:	Einat Pacifici
Docs Contact:
URL:
Whiteboard:	container
Depends On:
Blocks:	1503797
TreeView+	depends on / blocked

Reported:	2017-09-07 22:54 UTC by Saif Ali
Modified:	2020-12-14 09:56 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-19 10:37:56 UTC
Category:	Bug
Cloudforms Team:	Container Management
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Saif Ali 2017-09-07 22:54:41 UTC

Description of problem:
We have connected to an OpenShift provider and are trying to create a control policy, however the actions are not being executed. I do not see any OpenShift events in the evm logs on any of the Event Monitor appliances.

Version-Release number of selected component (if applicable):
5.8.1.5-20170725160636_e433fc0

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Beni Paskin-Cherniavsky 2017-09-11 15:17:58 UTC

First thing that stands out is workers stopped with "evm_worker_memory_exceeded", thousands of times.

[----] W, [2017-09-05T03:12:45.635906 #14767:1073130]  WARN -- : MIQ(MiqServer#validate_worker) Worker [MiqPriorityWorker] with ID: [101000000120629], PID: [30391], GUID: [6b21d240-9208-11e7-a7fe-005056962079] process memory usage [734983000] exceeded limit [629145600], requesting worker to exit

> grep 'WARN.*requesting worker to exit' */log/evm.log | egrep -o 'Worker \S+' | sort | uniq -c
   2632 Worker [ManageIQ::Providers::Openshift::ContainerManager::MetricsCollectorWorker]
   2272 Worker [MiqGenericWorker]
    360 Worker [MiqPriorityWorker]
    133 Worker [MiqReportingWorker]

I don't think this explains lack of events, but is not a healthy situation.  This system needs more RAM.

Looking deeper...

Comment 14 Beni Paskin-Cherniavsky 2017-09-19 08:16:10 UTC

Good news from evm_current_densba3osclf01.qic.tiaa-cref.org_20170914_124432/log/evm.log
is that it did receive events from openshift.

      2 event_type=>"CONTAINER_CREATED"
    525 event_type=>"CONTAINER_FAILED"
      1 event_type=>"CONTAINER_KILLING"
      2 event_type=>"CONTAINER_STARTED"
     31 event_type=>"CONTAINER_UNHEALTHY"
      2 event_type=>"POD_SCHEDULED"

Checking whether they made it any further through automate/policy...

Comment 16 Beni Paskin-Cherniavsky 2017-09-26 11:51:21 UTC

Found another problem in logs: customer has several Node alerts defined ("OSE Node CPU > 0", "OSE Node Datawarehouse Alerts", "OSE Node Mem > 0"),
and they don't work.

This never worked, it's a mistake that it's allowed in UI.
RFE to implement: bug 1494599 (added stacktrace from this log there).

Comment 19 Beni Paskin-Cherniavsky 2017-10-19 10:37:56 UTC

For this BZ, the problem customer is seeing is that specifically Pod policies don't work most of the time.  Node & Image events do trigger policies reliably.

I suspected but wasn't certain this is bug 1367114, and now finally found the evidence in logs.  It is bug 1367114, policies don't work when event arrives before pod was seen by inventory refresh.
(such event may not even be logged to policy.log, but in evm.log we can see it processed and hit "Unable to find target".)

Work continues on implementing workarounds for customer's use case, and several RFEs were filed, but this BZ I'm going to close as duplicate.

*** This bug has been marked as a duplicate of bug 1367114 ***

Note You need to log in before you can comment on or make changes to this bug.