Bug 1233798

Summary:	OpenStack Event Catcher Thread Constantly Failing and Restarting
Product:	Red Hat CloudForms Management Engine	Reporter:	John Prause <jprause>
Component:	Providers	Assignee:	Greg Blomquist <gblomqui>
Status:	CLOSED ERRATA	QA Contact:	Milan Falešník <mfalesni>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.4.0	CC:	cpelland, dajohnso, david.costakos, dclarizi, gblomqui, jfrey, jhardy, jrafanie, mfalesni, nachandr, obarenbo
Target Milestone:	GA	Keywords:	ZStream
Target Release:	5.4.1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	5.4.1.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1232484	Environment:
Last Closed:	2015-07-30 13:10:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1232484
Bug Blocks:

Comment 1 Greg Blomquist 2015-06-24 20:10:50 UTC

http://gitlab.cloudforms.lab.eng.rdu2.redhat.com/cloudforms/cfme/merge_requests/145

Comment 2 CFME Bot 2015-06-24 20:39:18 UTC

New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=bdc878afed77d8dcb539191dadb1d05dbb875421

commit bdc878afed77d8dcb539191dadb1d05dbb875421
Author:     Greg Blomquist <gblomqui>
AuthorDate: Tue Jun 23 13:30:07 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jun 24 16:05:37 2015 -0400

    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues
    work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the
    appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e.,
    where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it
    was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server,
    actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to
    the MiqServer#ipaddress.
    
    Original PR:
    https://github.com/ManageIQ/manageiq/pull/3050
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 3 CFME Bot 2015-06-24 20:39:31 UTC

New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=36efb1c837f179d701422ea3a4aae77e84c8d6bd

commit 36efb1c837f179d701422ea3a4aae77e84c8d6bd
Merge: 3ae711d bdc878a
Author:     Joe Rafaniello <jrafanie>
AuthorDate: Wed Jun 24 16:37:18 2015 -0400
Commit:     Joe Rafaniello <jrafanie>
CommitDate: Wed Jun 24 16:37:18 2015 -0400

    Merge branch 'bz1233798-backport_worker_base_for_bz1232484' into '5.4.z'
    
    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress.
    
    Upstream PR:
    https://github.com/ManageIQ/manageiq/pull/3244
    
    No cherry-pick conflicts
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178
    
    See merge request !145

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 5 Milan Falešník 2015-07-24 10:12:46 UTC

Hello Greg,

what is the best way to reproduce this issue? I tried wrong credentials for AMQP as dajo suggested on the original bug but that made it just restart the thread, I did not see the ERROR. I tried to reach the RHOS provider by the address in the bug but it is no longer available.

Comment 6 Greg Blomquist 2015-07-28 13:11:33 UTC

Hi Milan,

I think if this fix were failing, then you'd see the following error in the evm.log:

> [----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass]
[----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass  Method:[rescue in block in start_event_monitor]

Based on what we were seeing before, no EventCatcherOpenstack process would run without this error.

Comment 7 Milan Falešník 2015-07-28 15:53:12 UTC

I don't see such errors in log so moving to verified.

Comment 9 errata-xmlrpc 2015-07-30 13:10:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1511.html