Bug 975882

Summary: Nova doesn't close qpid connections after certain error conditions
Product: Red Hat OpenStack Reporter: yfried
Component: openstack-novaAssignee: Xavier Queralt <xqueralt>
Status: CLOSED ERRATA QA Contact: yfried
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: ajeain, apevec, dallan, ndipanov, oblaut, ohochman, sclewis, xqueralt, yfried
Target Milestone: z2Keywords: Rebase, Reopened, ZStream
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Rebase: Bug Fixes Only
Doc Text:
Rebase package(s) to version: 2013.1.3 Highlights and important bug fixes: https://launchpad.net/nova/+milestone/2013.1.3
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-03 20:19:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 993100    
Bug Blocks:    

Description yfried 2013-06-19 14:14:07 UTC
Version-Release number of selected component (if applicable):
openstack-nova-compute-2013.1.2-2.el6ost.noarch


In conditions where nova gets a connection to qpid but there is a problem at the application layer (such as a bad qpid password is specified), nova will continuously retry the connection without closing previous connections.

Steps to reproduce:
- Preconditions: At least 1 nova compute node using qpid for messaging, with authentication turned on.
- Specify a wrong qpid password in nova.conf on the compute node
- nova will continuously retry with the wrong password and print errors such as this to the compute log:
2013-04-25 16:37:52.269 ERROR nova.openstack.common.rpc.impl_qpid [req-1d15f33c-5b2d-4ee1-aaa1-ab0140a56608 None None] Unable to connect to AMQP server: connection-forced: Authentication failed(320). Sleeping 60 seconds

Actual results:
- Each time nova retries the connection, it will create another connection to qpid and not close the previous connections.

Expected results:


Additional info:

Comment 1 yfried 2013-06-20 06:30:59 UTC
link to same quantum bug (closed)
https://bugzilla.redhat.com/show_bug.cgi?id=962385#c9

Comment 2 Perry Myers 2013-06-20 12:52:34 UTC
Since this only occurs when an incorrect user/password is used, it implies that the issue only occurs before you've got a functional cloud infrastructure (can't have a working cloud w/o having qpid connectivity, etc)

So, I think this is worth release noting and fixing in the next release.

Comment 3 Alan Pevec 2013-06-20 16:04:28 UTC
(In reply to Perry Myers from comment #2)
> Since this only occurs when an incorrect user/password is used

Reconnect situation can happen any time there's an issue with connection during operation e.g. temporary network outage.

https://github.com/openstack/nova/blob/2013.1.2/nova/openstack/common/rpc/impl_qpid.py#L372

Comment 5 Alan Pevec 2013-07-09 10:31:03 UTC
Fixed in Nova havana-1 https://github.com/openstack/nova/commit/b4826d85c25a56ad95ffb76c467cdb459daba0c4

Comment 6 Alan Pevec 2013-07-09 11:33:19 UTC
Will be included in stable/grizzly 2013.1.3 release.

Comment 11 Omri Hochman 2013-08-19 13:53:00 UTC
Steps to reproduce: 
-------------------
1)Change in qpidd.conf -> auth=yes 

2)Restart qpidd '/etc/init.d/qpidd restart'

3)Create qpidd user and password by running (Enter: guest/guest) : 
'saslpasswd2 -f /var/lib/qpidd/qpidd.sasldb -u QPID guest'

4)Check the created qpidd user/password by running : 
'sasldblistusers2 -f /var/lib/qpidd/qpidd.sasldb'

5)Attempt to boot instance ( should work!! ) 

6)Change ion nova.conf : qpid_password=guest  --> qpid_password=badguest 

7)Attempt to boot instance. ( should stuck!! )

8)While the boot command stuck, check the number of opened sessions by running: "watch -d 'netstat -n |grep 5672 | wc -l' "

The number of open sessions should not constantly increase, but the number should reduce as well, when there are connections that are being closed. 

Note:
-------
When the bug reproduces - 'netstat -n |grep 5672' will show increase number of open sessions, that are being stopped only when stopping the nova command (ie - "boot").

Some sessions will only be closed when password is being restored.


More info: 
-----------
https://docspace.corp.redhat.com/docs/DOC-148763

Comment 16 Omri Hochman 2013-08-26 08:54:11 UTC
This issue is not reproducible when stopping cinder service - the number of connections reduces and it work as it should. 

According that info - this nova bug will be closed and copied to cinder Bz#1000972.

Comment 17 Omri Hochman 2013-08-26 11:24:55 UTC
Verified - with openstack-nova-2013.1.3-2.
> This issue is not reproducible when stopping cinder service - the number of
> connections reduces and it work as it should. 
> 
> According that info - this nova bug will be closed and copied to cinder
> Bz#1000972.

Comment 19 errata-xmlrpc 2013-09-03 20:19:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1199.html