Bug 1222005

Summary: AMQP connection doesn't recover after rabbitmq-server restart
Product: Red Hat CloudForms Management Engine Reporter: Marius Cornea <mcornea>
Component: ProvidersAssignee: Ladislav Smola <lsmola>
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: high Docs Contact:
Priority: medium    
Version: 5.4.0CC: brant.evans, clasohm, cpelland, gblomqui, gfa, jfrey, jhardy, jocarter, jprause, mfeifer, mfuruta, obarenbo, psavage, vstinner
Target Milestone: GAKeywords: ZStream
Target Release: 5.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: retest:openstack:event
Fixed In Version: 5.6.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1310245 1310248 (view as bug list) Environment:
Last Closed: 2016-06-29 10:55:22 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1291721, 1310245, 1310248    
Attachments:
Description Flags
logs on server and cfme none

Description Marius Cornea 2015-05-15 09:02:32 EDT
Created attachment 1025839 [details]
logs on server and cfme

Description of problem:
I've hit this with an Openstack Infra provider when restarting the rabbitmq-server on the undercloud. The AMQP connection doesn't seem to recover after rabbitmq-server restart.

Version-Release number of selected component (if applicable):
5.4.0.1.20150512111354_4368716

How reproducible:
Add a new Openstack Infrastructure provider.

Steps to Reproduce:
1. systemctl restart rabbitmq-server on the undercloud node
2. Check evm.log
3.

Actual results:
AMQP connection results in {frame_too_large,1342177289,131064} error. 

Expected results:
AMQP connection is re-established after server restart.

Additional info:

I am attaching all relevent logs.
Comment 2 Marius Cornea 2015-05-15 09:56:23 EDT
I forgot to mention the rabbitmq-server version I tested against:

rabbitmq-server-3.3.5-3.el7ost.noarch
rabbitmq-server-3.3.5-4.el7.noarch
Comment 5 Greg Blomquist 2016-02-10 14:29:18 EST
@Ladas,

can you work with @Marius to reproduce this?

John Eckersberg found this: https://github.com/ruby-amqp/bunny/pull/251.  Which was released by Bunny in 1.5.0+.  So, we might just want to bump the version of Bunny to see if that fixes this problem.

According to John, the libraries should be handling the reconnect for us.  At least, that's his experience is on the python side.
Comment 6 Ladislav Smola 2016-02-11 03:40:41 EST
@Greg but the gem update can't be backported, right? So should we target this to 5.6?
Comment 7 Greg Blomquist 2016-02-18 09:53:34 EST
@ladas, I'll check with Jason and John Prause regarding the backport of the gem.

In the meantime, work with Marius to see if bumping the gem has any impact.
Comment 8 John Prause 2016-02-19 15:51:14 EST
*** Bug 1247200 has been marked as a duplicate of this bug. ***
Comment 9 John Prause 2016-02-19 15:51:55 EST
*** Bug 1291721 has been marked as a duplicate of this bug. ***
Comment 10 Ladislav Smola 2016-02-22 09:49:56 EST
https://github.com/ManageIQ/manageiq/pull/6857

Seems like bunny update is fixing that. After systemctl restart rabbitmq-server, I don't see the frame related error and ManageIQ continues to receive new events.
Comment 11 Ladislav Smola 2016-03-01 09:09:48 EST
we need to use Bunny 2.1.0 because of https://github.com/ruby-amqp/bunny/issues/383
Comment 12 CFME Bot 2016-03-03 09:15:29 EST
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/0b3328db34a5cca44124a5496fac1c092e407eda

commit 0b3328db34a5cca44124a5496fac1c092e407eda
Author:     Ladislav Smola <lsmola@redhat.com>
AuthorDate: Mon Feb 22 15:43:07 2016 +0100
Commit:     Ladislav Smola <lsmola@redhat.com>
CommitDate: Tue Mar 1 15:01:49 2016 +0100

    Update Bunny gem
    
    Update Bunny gem, the old gem couldn't handle reconnect when
    amqp service got restarted.
    
    Seems like new bunny works the same, so no changes in using it
    are needed.
    
    Fixes BZ:
    https://bugzilla.redhat.com/show_bug.cgi?id=1222005

 gems/pending/Gemfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 13 CFME Bot 2016-03-11 09:59:56 EST
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=b1ab1307cb92e0ccff433ac4a394e8174e3f574c

commit b1ab1307cb92e0ccff433ac4a394e8174e3f574c
Author:     Ladislav Smola <lsmola@redhat.com>
AuthorDate: Mon Feb 22 15:43:07 2016 +0100
Commit:     Ladislav Smola <lsmola@redhat.com>
CommitDate: Mon Mar 7 13:02:02 2016 +0100

    Update Bunny gem
    
    Update Bunny gem, the old gem couldn't handle reconnect when
    amqp service got restarted.
    
    Seems like new bunny works the same, so no changes in using it
    are needed.
    
    Fixes BZ:
    https://bugzilla.redhat.com/show_bug.cgi?id=1222005

 gems/pending/Gemfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 14 CFME Bot 2016-03-11 10:00:16 EST
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=564108ee801d389b433777936eb9824b1dad81c2

commit 564108ee801d389b433777936eb9824b1dad81c2
Merge: 934d6fa b1ab130
Author:     Greg Blomquist <gblomqui@redhat.com>
AuthorDate: Fri Mar 11 09:41:15 2016 -0500
Commit:     Greg Blomquist <gblomqui@redhat.com>
CommitDate: Fri Mar 11 09:41:15 2016 -0500

    Merge branch 'bz1310245' into '5.5.z'
    
    Update Bunny gem
    
    Update Bunny gem, the old gem couldn't handle reconnect when
    amqp service got restarted.
    
    Seems like new bunny works the same, so no changes in using it
    are needed.
    
    Fixes BZ:
    https://bugzilla.redhat.com/show_bug.cgi?id=1222005
    
    Clean cherry-pick of:
    https://github.com/ManageIQ/manageiq/pull/6857
    
    Fixes 5.5.z BZ:
    https://bugzilla.redhat.com/show_bug.cgi?id=1310245
    
    
    
    See merge request !839

 gems/pending/Gemfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 15 Ronnie Rasouli 2016-05-30 04:36:50 EDT
Verified on 5.6.0.8-rc1-nightly
the errors did not appear, the amqp could re established
Comment 17 errata-xmlrpc 2016-06-29 10:55:22 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348
Comment 18 Victor Stinner 2017-02-22 10:10:26 EST
> External Bug ID: OpenStack gerrit 436958

Sorry, I commented the wrong bugzilla issue :-/