Bug 1233798

Summary: OpenStack Event Catcher Thread Constantly Failing and Restarting
Product: Red Hat CloudForms Management Engine Reporter: John Prause <jprause>
Component: ProvidersAssignee: Greg Blomquist <gblomqui>
Status: CLOSED ERRATA QA Contact: Milan Falešník <mfalesni>
Severity: high Docs Contact:
Priority: high    
Version: 5.4.0CC: cpelland, dajohnso, david.costakos, dclarizi, gblomqui, jfrey, jhardy, jrafanie, mfalesni, nachandr, obarenbo
Target Milestone: GAKeywords: ZStream
Target Release: 5.4.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 5.4.1.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1232484 Environment:
Last Closed: 2015-07-30 13:10:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1232484    
Bug Blocks:    

Comment 2 CFME Bot 2015-06-24 20:39:18 UTC
New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=bdc878afed77d8dcb539191dadb1d05dbb875421

commit bdc878afed77d8dcb539191dadb1d05dbb875421
Author:     Greg Blomquist <gblomqui>
AuthorDate: Tue Jun 23 13:30:07 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jun 24 16:05:37 2015 -0400

    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues
    work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the
    appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e.,
    where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it
    was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server,
    actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to
    the MiqServer#ipaddress.
    
    Original PR:
    https://github.com/ManageIQ/manageiq/pull/3050
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 3 CFME Bot 2015-06-24 20:39:31 UTC
New commit detected on cfme/5.4.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=36efb1c837f179d701422ea3a4aae77e84c8d6bd

commit 36efb1c837f179d701422ea3a4aae77e84c8d6bd
Merge: 3ae711d bdc878a
Author:     Joe Rafaniello <jrafanie>
AuthorDate: Wed Jun 24 16:37:18 2015 -0400
Commit:     Joe Rafaniello <jrafanie>
CommitDate: Wed Jun 24 16:37:18 2015 -0400

    Merge branch 'bz1233798-backport_worker_base_for_bz1232484' into '5.4.z'
    
    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress.
    
    Upstream PR:
    https://github.com/ManageIQ/manageiq/pull/3244
    
    No cherry-pick conflicts
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1233798
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1225173
    https://bugzilla.redhat.com/show_bug.cgi?id=1225178
    
    See merge request !145

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 5 Milan Falešník 2015-07-24 10:12:46 UTC
Hello Greg,

what is the best way to reproduce this issue? I tried wrong credentials for AMQP as dajo suggested on the original bug but that made it just restart the thread, I did not see the ERROR. I tried to reach the RHOS provider by the address in the bug but it is no longer available.

Comment 6 Greg Blomquist 2015-07-28 13:11:33 UTC
Hi Milan,

I think if this fix were failing, then you'd see the following error in the evm.log:

> [----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass]
[----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass  Method:[rescue in block in start_event_monitor]

Based on what we were seeing before, no EventCatcherOpenstack process would run without this error.

Comment 7 Milan Falešník 2015-07-28 15:53:12 UTC
I don't see such errors in log so moving to verified.

Comment 9 errata-xmlrpc 2015-07-30 13:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1511.html