Bug 1247200 - OpenStack Event Catcher doesn't reconnect if RabbitMQ server restarted
Summary: OpenStack Event Catcher doesn't reconnect if RabbitMQ server restarted
Keywords:
Status: CLOSED DUPLICATE of bug 1222005
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.6.0
Assignee: Greg Blomquist
QA Contact: Pete Savage
URL:
Whiteboard: openstack:event
Depends On:
Blocks: 1291721
TreeView+ depends on / blocked
 
Reported: 2015-07-27 14:37 UTC by Pete Savage
Modified: 2017-12-05 15:12 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1291721 (view as bug list)
Environment:
Last Closed: 2016-02-19 20:51:14 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Pete Savage 2015-07-27 14:37:35 UTC
Description of problem: If the rabbitmq server is restarted which the event catcher is running, the event catcher continually loops and never reconnects properly until either the worker is restarted, or the service is restarted.


Version-Release number of selected component (if applicable): 5.4.1.0


How reproducible: 100%


Steps to Reproduce:
1. Add an OpenStack provider with RabbitMQ as the message queue
2. Restart the RabbitMQ server
3.

Actual results: Event Catcher doesn't catch any more events


Expected results: Event Catcher should handle the rabbit restart


Additional info:
[----] E, [2015-07-27T10:36:43.033493 #3261:3298b0c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [xx.xx.xx.xx] as [admin] Event Monitor Thread aborted because [Connection reset by peer]
[----] E, [2015-07-27T10:36:43.033693 #3261:3298b0c] ERROR -- : [Errno::ECONNRESET]: Connection reset by peer  Method:[rescue in block in start_event_monitor]
[----] E, [2015-07-27T10:36:43.033815 #3261:3298b0c] ERROR -- : /opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:41:in `read_nonblock'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:41:in `block in read_fully'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:40:in `loop'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:40:in `read_fully'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/transport.rb:196:in `read_next_frame'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/session.rb:876:in `init_connection'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/session.rb:247:in `start'
/var/www/miq/lib/openstack/amqp/openstack_rabbit_event_monitor.rb:58:in `start'
/var/www/miq/vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb:41:in `monitor_events'
/var/www/miq/vmdb/lib/workers/event_catcher.rb:99:in `block in start_event_monitor'
/var/www/miq/vmdb/lib/extensions/ar_thread.rb:11:in `block in new_with_release'

Comment 2 Greg Blomquist 2015-07-27 20:45:41 UTC
https://github.com/ManageIQ/manageiq/pull/3616

Comment 3 CFME Bot 2015-07-30 14:30:01 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/81cd635b10e46765c613fb31426c18ae1f1678db

commit 81cd635b10e46765c613fb31426c18ae1f1678db
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Jul 27 16:20:01 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jul 29 18:20:06 2015 -0400

    Change caching for OpenstackEventMonitor
    
    OpenstackEventMonitor implementation classes were cached to be sure that we
    didn't do expensive connection tests each time events were gathered for an
    openstack provider.
    
    However, the cache was a permenant cache and was only cleared when the appliance
    was restarted.  The new cache will invalidate every 5 minutes.
    
    This cache invalidation will allow the OpenstackEventMonitor to recover from
    communication failures with the AMQP service.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1247200

 gems/pending/openstack/openstack_event_monitor.rb | 47 +++++++++++------------
 1 file changed, 22 insertions(+), 25 deletions(-)

Comment 4 CFME Bot 2015-07-30 14:30:04 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/728aa7b01221d4ef3426f1bf751eb1cade0471d2

commit 728aa7b01221d4ef3426f1bf751eb1cade0471d2
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Jul 27 16:22:34 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jul 29 18:20:07 2015 -0400

    Implement OpenstackNullEventMonitor methods
    
    Originally, OpenstackNullEventMonitor raised NotImplementedErrors when the
    standard start, stop, and each_batch methods were called.  It turns out that
    this was killing the OpenstackEventCatcher worker thread.  In turn, this
    resulted in tons of messages in the logs showing the event catcher dying and
    restarting.
    
    By changing these methods to be implemented and empty, it will allow the event
    catcher thread to do nothing when the event monitor is the
    OpenstackNullEventMonitor.
    
    This coupled with better cache invalidation will allow the OpenstackEventMonitor
    to recover from communication failures with the AMQP service.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1247200

 gems/pending/openstack/amqp/openstack_null_event_monitor.rb | 13 +++++--------
 lib/workers/mixins/event_catcher_openstack_mixin.rb         |  6 ++++--
 2 files changed, 9 insertions(+), 10 deletions(-)

Comment 6 John Prause 2016-02-19 20:51:14 UTC

*** This bug has been marked as a duplicate of bug 1222005 ***


Note You need to log in before you can comment on or make changes to this bug.