Bug 1281746

Summary: InfraManager::EventCatcher worker keeps getting restarted
Product: Red Hat CloudForms Management Engine Reporter: Marius Cornea <mcornea>
Component: ProvidersAssignee: Joe Vlcek <jvlcek>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 5.5.0CC: akrzos, cpelland, dkorn, gblomqui, gmccullo, jfrey, jhardy, mfeifer, mkanoor, obarenbo, rananda, simaishi, tfitzger
Target Milestone: GA   
Target Release: 5.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.5.0.11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-08 13:47:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
evm.log
none
policy log none

Description Marius Cornea 2015-11-13 11:25:25 UTC
Created attachment 1093590 [details]
evm.log

Description of problem:
The ManageIQ::Providers::Openstack::InfraManager::EventCatcher worker keeps getting restarted and no events show up in the Openstack Platform Director provider.

Version-Release number of selected component (if applicable):
5.5.0.10-beta2.1.20151110134042_d6f5459

How reproducible:
100%

Steps to Reproduce:
1. Add Openstack Platform Director
2. Scale out with additional compute node


Actual results:
No events show up in the Timelines

Expected results:
Events would be captured.

Additional info:
Attaching evm.log and policy.log.

Comment 2 Marius Cornea 2015-11-13 11:26:49 UTC
Created attachment 1093592 [details]
policy log

Comment 3 Greg McCullough 2015-11-13 14:49:01 UTC
https://github.com/ManageIQ/manageiq/pull/5415

Comment 4 Alex Krzos 2015-11-13 18:10:25 UTC
My tests have found that this also affects both VMware and RHEVM provider Eventcatchers as well on a 5.5.0.10 appliance.  I do not see this behavior on 5.5.0.9.

In my logs I am seeing:

[----] E, [2015-11-13T12:43:27.235603 #43235:6d3990] ERROR -- : MIQ(MiqServer#validate_worker) Worker [ManageIQ::Providers::Redhat::InfraManager::EventCatcher] with ID: [471], PID: [39176], GUID: [b907b2b6-8a2d-11e5-8ab5-001a4a223904] has not responded in 132.774980603 seconds, restarting worker


[----] E, [2015-11-13T12:45:30.153841 #43235:6d3990] ERROR -- : MIQ(MiqServer#validate_worker) Worker [ManageIQ::Providers::Vmware::InfraManager::EventCatcher] with ID: [472], PID: [39309], GUID: [02556b52-8a2e-11e5-8ab5-001a4a223904] has not responded in 132.259265315 seconds, restarting worker


Thus an eventcatcher worker is restarting about every 2m15s in the environments I have.

Comment 5 Joe Vlcek 2015-11-13 18:52:36 UTC
(In reply to Alex Krzos from comment #4)
> My tests have found that this also affects both VMware and RHEVM provider
> Eventcatchers as well on a 5.5.0.10 appliance.  I do not see this behavior
> on 5.5.0.9.
> 
> In my logs I am seeing:
> 
> [----] E, [2015-11-13T12:43:27.235603 #43235:6d3990] ERROR -- :
> MIQ(MiqServer#validate_worker) Worker
> [ManageIQ::Providers::Redhat::InfraManager::EventCatcher] with ID: [471],
> PID: [39176], GUID: [b907b2b6-8a2d-11e5-8ab5-001a4a223904] has not responded
> in 132.774980603 seconds, restarting worker
> 
> 
> [----] E, [2015-11-13T12:45:30.153841 #43235:6d3990] ERROR -- :
> MIQ(MiqServer#validate_worker) Worker
> [ManageIQ::Providers::Vmware::InfraManager::EventCatcher] with ID: [472],
> PID: [39309], GUID: [02556b52-8a2e-11e5-8ab5-001a4a223904] has not responded
> in 132.259265315 seconds, restarting worker
> 
> 
> Thus an eventcatcher worker is restarting about every 2m15s in the
> environments I have.
Correct Alex, and a fix it on the way. JoeV

Comment 6 Joe Vlcek 2015-11-18 16:20:23 UTC
*** Bug 1283205 has been marked as a duplicate of this bug. ***

Comment 7 Marius Cornea 2015-11-20 18:25:06 UTC
Verified in 5.5.0.11:

 ManageIQ::Providers::Openstack::InfraManager::EventCatcher           | started | 36 | 31203 | 31228 | ems_1                 | 2015-11-20T12:29:30Z | 2015-11-20T18:23:21Z

Comment 8 Joe Vlcek 2015-11-30 15:33:18 UTC
*** Bug 1285341 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2015-12-08 13:47:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2551