Bug 1247200

Summary: OpenStack Event Catcher doesn't reconnect if RabbitMQ server restarted
Product: Red Hat CloudForms Management Engine Reporter: Pete Savage <psavage>
Component: ProvidersAssignee: Greg Blomquist <gblomqui>
Status: CLOSED DUPLICATE QA Contact: Pete Savage <psavage>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.4.0CC: abellott, gblomqui, jfrey, jhardy, jprause, mfeifer, obarenbo
Target Milestone: GA   
Target Release: 5.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: openstack:event
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1291721 (view as bug list) Environment:
Last Closed: 2016-02-19 20:51:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1291721    

Description Pete Savage 2015-07-27 14:37:35 UTC
Description of problem: If the rabbitmq server is restarted which the event catcher is running, the event catcher continually loops and never reconnects properly until either the worker is restarted, or the service is restarted.


Version-Release number of selected component (if applicable): 5.4.1.0


How reproducible: 100%


Steps to Reproduce:
1. Add an OpenStack provider with RabbitMQ as the message queue
2. Restart the RabbitMQ server
3.

Actual results: Event Catcher doesn't catch any more events


Expected results: Event Catcher should handle the rabbit restart


Additional info:
[----] E, [2015-07-27T10:36:43.033493 #3261:3298b0c] ERROR -- : MIQ(EventCatcherOpenstack) EMS [xx.xx.xx.xx] as [admin] Event Monitor Thread aborted because [Connection reset by peer]
[----] E, [2015-07-27T10:36:43.033693 #3261:3298b0c] ERROR -- : [Errno::ECONNRESET]: Connection reset by peer  Method:[rescue in block in start_event_monitor]
[----] E, [2015-07-27T10:36:43.033815 #3261:3298b0c] ERROR -- : /opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:41:in `read_nonblock'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:41:in `block in read_fully'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:40:in `loop'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/cruby/socket.rb:40:in `read_fully'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/transport.rb:196:in `read_next_frame'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/session.rb:876:in `init_connection'
/opt/rh/cfme-gemset/gems/bunny-1.0.7/lib/bunny/session.rb:247:in `start'
/var/www/miq/lib/openstack/amqp/openstack_rabbit_event_monitor.rb:58:in `start'
/var/www/miq/vmdb/lib/workers/mixins/event_catcher_openstack_mixin.rb:41:in `monitor_events'
/var/www/miq/vmdb/lib/workers/event_catcher.rb:99:in `block in start_event_monitor'
/var/www/miq/vmdb/lib/extensions/ar_thread.rb:11:in `block in new_with_release'

Comment 2 Greg Blomquist 2015-07-27 20:45:41 UTC
https://github.com/ManageIQ/manageiq/pull/3616

Comment 3 CFME Bot 2015-07-30 14:30:01 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/81cd635b10e46765c613fb31426c18ae1f1678db

commit 81cd635b10e46765c613fb31426c18ae1f1678db
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Jul 27 16:20:01 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jul 29 18:20:06 2015 -0400

    Change caching for OpenstackEventMonitor
    
    OpenstackEventMonitor implementation classes were cached to be sure that we
    didn't do expensive connection tests each time events were gathered for an
    openstack provider.
    
    However, the cache was a permenant cache and was only cleared when the appliance
    was restarted.  The new cache will invalidate every 5 minutes.
    
    This cache invalidation will allow the OpenstackEventMonitor to recover from
    communication failures with the AMQP service.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1247200

 gems/pending/openstack/openstack_event_monitor.rb | 47 +++++++++++------------
 1 file changed, 22 insertions(+), 25 deletions(-)

Comment 4 CFME Bot 2015-07-30 14:30:04 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/728aa7b01221d4ef3426f1bf751eb1cade0471d2

commit 728aa7b01221d4ef3426f1bf751eb1cade0471d2
Author:     Greg Blomquist <gblomqui>
AuthorDate: Mon Jul 27 16:22:34 2015 -0400
Commit:     Greg Blomquist <gblomqui>
CommitDate: Wed Jul 29 18:20:07 2015 -0400

    Implement OpenstackNullEventMonitor methods
    
    Originally, OpenstackNullEventMonitor raised NotImplementedErrors when the
    standard start, stop, and each_batch methods were called.  It turns out that
    this was killing the OpenstackEventCatcher worker thread.  In turn, this
    resulted in tons of messages in the logs showing the event catcher dying and
    restarting.
    
    By changing these methods to be implemented and empty, it will allow the event
    catcher thread to do nothing when the event monitor is the
    OpenstackNullEventMonitor.
    
    This coupled with better cache invalidation will allow the OpenstackEventMonitor
    to recover from communication failures with the AMQP service.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1247200

 gems/pending/openstack/amqp/openstack_null_event_monitor.rb | 13 +++++--------
 lib/workers/mixins/event_catcher_openstack_mixin.rb         |  6 ++++--
 2 files changed, 9 insertions(+), 10 deletions(-)

Comment 6 John Prause 2016-02-19 20:51:14 UTC

*** This bug has been marked as a duplicate of bug 1222005 ***