Bug 1233798 - OpenStack Event Catcher Thread Constantly Failing and Restarting
Summary: OpenStack Event Catcher Thread Constantly Failing and Restarting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.4.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: GA
: 5.4.1
Assignee: Greg Blomquist
QA Contact: Milan Falešník
URL:
Whiteboard:
Depends On: 1232484
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-19 13:27 UTC by John Prause
Modified: 2015-07-30 13:10 UTC (History)
11 users (show)

Fixed In Version: 5.4.1.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1232484
Environment:
Last Closed: 2015-07-30 13:10:21 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1511 0 normal SHIPPED_LIVE CFME 5.4.1 bug fixes, and enhancement update 2015-07-30 17:10:05 UTC

Comment 2 CFME Bot 2015-06-24 20:39:18 UTC
New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=bdc878afed77d8dcb539191dadb1d05dbb875421

commit bdc878afed77d8dcb539191dadb1d05dbb875421
Author:     Greg Blomquist <gblomqui>
AuthorDate: Tue Jun 23 13:30:07 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jun 24 16:05:37 2015 -0400

    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues
    work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the
    appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e.,
    where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it
    was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server,
    actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to
    the MiqServer#ipaddress.
    
    Original PR:
    https://github.com/ManageIQ/manageiq/pull/3050
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 3 CFME Bot 2015-06-24 20:39:31 UTC
New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=36efb1c837f179d701422ea3a4aae77e84c8d6bd

commit 36efb1c837f179d701422ea3a4aae77e84c8d6bd
Merge: 3ae711d bdc878a
Author:     Joe Rafaniello <jrafanie>
AuthorDate: Wed Jun 24 16:37:18 2015 -0400
Commit:     Joe Rafaniello <jrafanie>
CommitDate: Wed Jun 24 16:37:18 2015 -0400

    Merge branch 'bz1233798-backport_worker_base_for_bz1232484' into '5.4.z'
    
    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress.
    
    Upstream PR:
    https://github.com/ManageIQ/manageiq/pull/3244
    
    No cherry-pick conflicts
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178
    
    See merge request !145

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 5 Milan Falešník 2015-07-24 10:12:46 UTC
Hello Greg,

what is the best way to reproduce this issue? I tried wrong credentials for AMQP as dajo suggested on the original bug but that made it just restart the thread, I did not see the ERROR. I tried to reach the RHOS provider by the address in the bug but it is no longer available.

Comment 6 Greg Blomquist 2015-07-28 13:11:33 UTC
Hi Milan,

I think if this fix were failing, then you'd see the following error in the evm.log:

> [----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass]
[----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass  Method:[rescue in block in start_event_monitor]

Based on what we were seeing before, no EventCatcherOpenstack process would run without this error.

Comment 7 Milan Falešník 2015-07-28 15:53:12 UTC
I don't see such errors in log so moving to verified.

Comment 9 errata-xmlrpc 2015-07-30 13:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1511.html


Note You need to log in before you can comment on or make changes to this bug.