Description of problem: The Event Catcher thread in CloudForms 3.2 latest is constantly failing and restarting. Version-Release number of selected component (if applicable): 5.4.0.5.20150605150206_7daa1a8 How reproducible: always Steps to Reproduce: 1. deploy 3.2 appliance 2. add openstack cloud provider 3. look in evm logs Actual results: functional event catcher thread Expected results: Additional info: Relevant Log Messages: [----] I, [2015-06-16T16:59:36.047117 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Starting Event Monitor Thread [----] I, [2015-06-16T16:59:36.047213 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Started Event Monitor Thread [----] I, [2015-06-16T16:59:51.047863 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread gone. Restarting... [----] I, [2015-06-16T16:59:51.048172 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Validating Connection/Credentials [----] I, [2015-06-16T16:59:51.048600 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Starting Event Monitor Thread [----] I, [2015-06-16T16:59:51.048774 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Started Event Monitor Thread [----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass] [----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass Method:[rescue in block in start_event_monitor] [----] I, [2015-06-16T17:02:06.067582 #3752:1099eac] INFO -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Started Event Monitor Thread [----] E, [2015-06-16T17:02:06.068188 #3752:416df9c] ERROR -- : /var/www/miq/vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb:17:in `event_monitor_handle' /var/www/miq/vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb:41:in `monitor_events' /var/www/miq/vmdb/lib/workers/event_catcher.rb:99:in `block in start_event_monitor' /var/www/miq/vmdb/lib/extensions/ar_thread.rb:11:in `block in new_with_release' You can see this on appliance at 10.11.164.150 today. This is a currently unused internal appliance, feel free to login to it with default credentials to view the error. I am unsure if this is related or not, but this appliance is consistently runnin g out of memory after about 24 hours. kswapd starts using 40% of the CPU for IOWAIT, and the machine has to be rebooted each day.
https://github.com/ManageIQ/manageiq/pull/3244
New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/3649eb07a5e8b1b9a5f56dab11eb205e66758ef5 commit 3649eb07a5e8b1b9a5f56dab11eb205e66758ef5 Author: Greg Blomquist <gblomqui> AuthorDate: Tue Jun 23 13:30:07 2015 -0400 Commit: Greg Blomquist <gblomqui> CommitDate: Tue Jun 23 16:00:12 2015 -0400 Include miq_server when retrieving worker To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part of the binding queue's name. However, there were a couple of things working against this fix. First, the appliance's IP address was not readily available to the worker process. Second, ManageIQ has a DB connection pool with only one connection. And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are opening a new DB connection. The original fix never actually tried opening the a new connection. Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup Nil#ipaddress. This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread. This keeps the thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress. Original PR: https://github.com/ManageIQ/manageiq/pull/3050 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1232484 References: https://bugzilla.redhat.com/show_bug.cgi?id=1224389 https://bugzilla.redhat.com/show_bug.cgi?id=1223976 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb | 2 +- vmdb/lib/workers/worker_base.rb | 13 +++++++------ 2 files changed, 8 insertions(+), 7 deletions(-)
Good to go. Verified and working fine in 5.5.0.8-beta1.4.20151027164951_4ab7fea. Did not see the error message after performing the following operations 1. Added RHOS provider and waited for sometime 2. Added valid / invalid AMPQ credentials and waited for sometime Hence moving it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2551