Bug 1233798
Summary: | OpenStack Event Catcher Thread Constantly Failing and Restarting | ||
---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | John Prause <jprause> |
Component: | Providers | Assignee: | Greg Blomquist <gblomqui> |
Status: | CLOSED ERRATA | QA Contact: | Milan Falešník <mfalesni> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.4.0 | CC: | cpelland, dajohnso, david.costakos, dclarizi, gblomqui, jfrey, jhardy, jrafanie, mfalesni, nachandr, obarenbo |
Target Milestone: | GA | Keywords: | ZStream |
Target Release: | 5.4.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 5.4.1.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1232484 | Environment: | |
Last Closed: | 2015-07-30 13:10:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1232484 | ||
Bug Blocks: |
Comment 1
Greg Blomquist
2015-06-24 20:10:50 UTC
New commit detected on cfme/5.4.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=bdc878afed77d8dcb539191dadb1d05dbb875421 commit bdc878afed77d8dcb539191dadb1d05dbb875421 Author: Greg Blomquist <gblomqui> AuthorDate: Tue Jun 23 13:30:07 2015 -0400 Commit: Greg Blomquist <gblomqui> CommitDate: Wed Jun 24 16:05:37 2015 -0400 Include miq_server when retrieving worker To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part of the binding queue's name. However, there were a couple of things working against this fix. First, the appliance's IP address was not readily available to the worker process. Second, ManageIQ has a DB connection pool with only one connection. And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are opening a new DB connection. The original fix never actually tried opening the a new connection. Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup Nil#ipaddress. This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread. This keeps the thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress. Original PR: https://github.com/ManageIQ/manageiq/pull/3050 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1233798 References: https://bugzilla.redhat.com/show_bug.cgi?id=1225173 https://bugzilla.redhat.com/show_bug.cgi?id=1225178 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb | 2 +- vmdb/lib/workers/worker_base.rb | 13 +++++++------ 2 files changed, 8 insertions(+), 7 deletions(-) New commit detected on cfme/5.4.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=36efb1c837f179d701422ea3a4aae77e84c8d6bd commit 36efb1c837f179d701422ea3a4aae77e84c8d6bd Merge: 3ae711d bdc878a Author: Joe Rafaniello <jrafanie> AuthorDate: Wed Jun 24 16:37:18 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Wed Jun 24 16:37:18 2015 -0400 Merge branch 'bz1233798-backport_worker_base_for_bz1232484' into '5.4.z' Include miq_server when retrieving worker To try to make the way the OpenStack event catcher creates binding queues work a little better, the appliance's IP address was looked up and used as part of the binding queue's name. However, there were a couple of things working against this fix. First, the appliance's IP address was not readily available to the worker process. Second, ManageIQ has a DB connection pool with only one connection. And, threads (i.e., where event catcher workers do all their work) that attempt to run queries are opening a new DB connection. The original fix never actually tried opening the a new connection. Instead, it was perfectly happy to get back a nil value for the appliance and try to lookup Nil#ipaddress. This fix gets around this problem by throwing the appliance record (miq_server, actually) into an ivar and making that available to the thread. This keeps the thread from having to query for the miq_server, while still giving it access to the MiqServer#ipaddress. Upstream PR: https://github.com/ManageIQ/manageiq/pull/3244 No cherry-pick conflicts Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1233798 References: https://bugzilla.redhat.com/show_bug.cgi?id=1225173 https://bugzilla.redhat.com/show_bug.cgi?id=1225178 See merge request !145 vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb | 2 +- vmdb/lib/workers/worker_base.rb | 13 +++++++------ 2 files changed, 8 insertions(+), 7 deletions(-) Hello Greg, what is the best way to reproduce this issue? I tried wrong credentials for AMQP as dajo suggested on the original bug but that made it just restart the thread, I did not see the ERROR. I tried to reach the RHOS provider by the address in the bug but it is no longer available. Hi Milan,
I think if this fix were failing, then you'd see the following error in the evm.log:
> [----] E, [2015-06-16T17:02:06.067987 #3752:416df9c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [10.11.165.108] as [rdu-ospadmin] Event Monitor Thread aborted because [undefined method `ipaddress' for nil:NilClass]
[----] E, [2015-06-16T17:02:06.068082 #3752:416df9c] ERROR -- : [NoMethodError]: undefined method `ipaddress' for nil:NilClass Method:[rescue in block in start_event_monitor]
Based on what we were seeing before, no EventCatcherOpenstack process would run without this error.
I don't see such errors in log so moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1511.html |