Bug 1232484

Summary: OpenStack Event Catcher Thread Constantly Failing and Restarting
Product: Red Hat CloudForms Management Engine Reporter: david.costakos
Component: ProvidersAssignee: Greg Blomquist <gblomqui>
Status: CLOSED ERRATA QA Contact: Ramesh A <rananda>
Severity: high Docs Contact:
Priority: high    
Version: 5.4.0CC: clasohm, cpelland, dclarizi, gblomqui, jfrey, jhardy, jrafanie, kmorey, mfeifer, obarenbo
Target Milestone: GAKeywords: ZStream
Target Release: 5.5.0Flags: rananda: automate_bug-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 5.5.0.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1233798 (view as bug list) Environment:
Last Closed: 2015-12-08 13:14:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1233798    

Description david.costakos 2015-06-16 21:04:30 UTC
Description of problem:
The Event Catcher thread in CloudForms 3.2 latest is constantly failing and restarting.

Version-Release number of selected component (if applicable):
5.4.0.5.20150605150206_7daa1a8

How reproducible:
always

Steps to Reproduce:
1. deploy 3.2 appliance
2. add openstack cloud provider
3. look in evm logs

Actual results:
functional event catcher thread


Expected results:


Additional info:
Relevant Log Messages:
[----] I, [2015-06-16T16:59:36.047117 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Starting Event Monitor Thread
[----] I, [2015-06-16T16:59:36.047213 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Started Event Monitor Thread
[----] I, [2015-06-16T16:59:51.047863 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread gone. Restarting...
[----] I, [2015-06-16T16:59:51.048172 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Validating Connection/Credentials
[----] I, [2015-06-16T16:59:51.048600 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Starting Event Monitor Thread
[----] I, [2015-06-16T16:59:51.048774 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Started Event Monitor Thread
[----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass]
[----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass  Method:[rescue in block in start_event_monitor]
[----] I, [2015-06-16T17:02:06.067582 #3752:1099eac]  INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Started Event Monitor Thread
[----] E, [2015-06-16T17:02:06.068188 #3752:416df9c] ERROR -- : /var/www/miq/vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb:17:in `event_monitor_handle'
/var/www/miq/vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb:41:in `monitor_events'
/var/www/miq/vmdb/lib/workers/event_catcher.rb:99:in `block in start_event_monitor'
/var/www/miq/vmdb/lib/extensions/ar_thread.rb:11:in `block in new_with_release'


You can see this on appliance at 10.11.164.150 today. This is a currently unused internal appliance, feel free to login to it with default credentials to view the error.

I am unsure if this is related or not, but this appliance is consistently runnin g out of memory after about 24 hours.  kswapd starts using 40% of the CPU for IOWAIT, and the machine has to be rebooted each day.

Comment 5 Greg Blomquist 2015-06-23 18:31:46 UTC
https://github.com/ManageIQ/manageiq/pull/3244

Comment 6 CFME Bot 2015-06-24 19:41:28 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/3649eb07a5e8b1b9a5f56dab11eb205e66758ef5

commit 3649eb07a5e8b1b9a5f56dab11eb205e66758ef5
Author:     Greg Blomquist <gblomqui>
AuthorDate: Tue Jun 23 13:30:07 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Tue Jun 23 16:00:12 2015 -0400

    Include miq_server when retrieving worker
    
    To try to make the way the OpenStack event catcher creates binding queues
    work a little better, the appliance's IP address was looked up and used as part
    of the binding queue's name.
    
    However, there were a couple of things working against this fix.  First, the
    appliance's IP address was not readily available to the worker process.  Second,
    ManageIQ has a DB connection pool with only one connection.  And, threads (i.e.,
    where event catcher workers do all their work) that attempt to run queries are
    opening a new DB connection.
    
    The original fix never actually tried opening the a new connection.  Instead, it
    was perfectly happy to get back a nil value for the appliance and try to lookup
    Nil#ipaddress.
    
    This fix gets around this problem by throwing the appliance record (miq_server,
    actually) into an ivar and making that available to the thread.  This keeps the
    thread from having to query for the miq_server, while still giving it access to
    the MiqServer#ipaddress.
    
    Original PR:
    https://github.com/ManageIQ/manageiq/pull/3050
    
    Fixes:
    https://bugzilla.redhat.com/show_bug.cgi?id=1232484
    
    References:
    https://bugzilla.redhat.com/show_bug.cgi?id=1224389
    https://bugzilla.redhat.com/show_bug.cgi?id=1223976

 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb |  2 +-
 vmdb/lib/workers/worker_base.rb                          | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

Comment 7 Ramesh A 2015-10-30 06:31:48 UTC
Good to go.  Verified and working fine in 5.5.0.8-beta1.4.20151027164951_4ab7fea.

Did not see the error message after performing the following operations
1. Added RHOS provider and waited for sometime
2. Added valid / invalid AMPQ credentials and waited for sometime

Hence moving it to verified state.

Comment 9 errata-xmlrpc 2015-12-08 13:14:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2551