1233798 – OpenStack Event Catcher Thread Constantly Failing and Restarting

Bug 1233798 - OpenStack Event Catcher Thread Constantly Failing and Restarting

Summary: OpenStack Event Catcher Thread Constantly Failing and Restarting

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.4.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.4.1
Assignee:	Greg Blomquist
QA Contact:	Milan Falešník
Docs Contact:
URL:
Whiteboard:
Depends On:	1232484
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-19 13:27 UTC by John Prause
Modified:	2015-07-30 13:10 UTC (History)
CC List:	11 users (show)
Fixed In Version:	5.4.1.0
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1232484
Environment:
Last Closed:	2015-07-30 13:10:21 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:1511	0	normal	SHIPPED_LIVE	CFME 5.4.1 bug fixes, and enhancement update	2015-07-30 17:10:05 UTC

Comment 1 Greg Blomquist 2015-06-24 20:10:50 UTC

http://gitlab.cloudforms.lab.eng.rdu2.redhat.com/cloudforms/cfme/merge_requests/145

Comment 2 CFME Bot 2015-06-24 20:39:18 UTC

New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=bdc878afed77d8dcb539191dadb1d05dbb875421

commit bdc878afed77d8dcb539191dadb1d05dbb875421
Author:     Greg Blomquist <gblomqui>
AuthorDate: Tue Jun 23 13:30:07 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jun 24 16:05:37 2015 -0400

    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues
    work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the
    appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e.,
    where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it
    was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server,
    actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to
    the MiqServer#ipaddress.
    
    Original PR:
    https://github.com/ManageIQ/manageiq/pull/3050
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 3 CFME Bot 2015-06-24 20:39:31 UTC

New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=36efb1c837f179d701422ea3a4aae77e84c8d6bd

commit 36efb1c837f179d701422ea3a4aae77e84c8d6bd
Merge: 3ae711d bdc878a
Author:     Joe Rafaniello <jrafanie>
AuthorDate: Wed Jun 24 16:37:18 2015 -0400
Commit:     Joe Rafaniello <jrafanie>
CommitDate: Wed Jun 24 16:37:18 2015 -0400

    Merge branch 'bz1233798-backport_worker_base_for_bz1232484' into '5.4.z'
    
    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress.
    
    Upstream PR:
    https://github.com/ManageIQ/manageiq/pull/3244
    
    No cherry-pick conflicts
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178
    
    See merge request !145

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 5 Milan Falešník 2015-07-24 10:12:46 UTC

Hello Greg,

what is the best way to reproduce this issue? I tried wrong credentials for AMQP as dajo suggested on the original bug but that made it just restart the thread, I did not see the ERROR. I tried to reach the RHOS provider by the address in the bug but it is no longer available.

Comment 6 Greg Blomquist 2015-07-28 13:11:33 UTC

Hi Milan,

I think if this fix were failing, then you'd see the following error in the evm.log:

> [----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass]
[----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass  Method:[rescue in block in start_event_monitor]

Based on what we were seeing before, no EventCatcherOpenstack process would run without this error.

Comment 7 Milan Falešník 2015-07-28 15:53:12 UTC

I don't see such errors in log so moving to verified.

Comment 9 errata-xmlrpc 2015-07-30 13:10:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1511.html

Note You need to log in before you can comment on or make changes to this bug.