Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1060687

Summary: DB connection error code 2013 not handled
Product: Red Hat OpenStack Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: openstack-cinderAssignee: Flavio Percoco <fpercoco>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.0CC: dnavale, eharney, fdinitto, fpercoco, gfidente, yeylon
Target Milestone: z4Keywords: ZStream
Target Release: 4.0Flags: pm-rhel: internal-review+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-2013.2.3-1.el6ost Doc Type: Bug Fix
Doc Text:
Previously, due to an un-handled connection error status code, the reconnect operation for Block Storage service was not triggered. This resulted in an issue from the High-Availability stand point as the nodes would fail to reconnect once the database was back up. As a fix, this status code is added to the list of connection error status codes in the database library.
Story Points: ---
Clone Of:
: 1060771 1060783 (view as bug list) Environment:
Last Closed: 2014-05-29 19:57:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1060771, 1060783    

Description Fabio Massimo Di Nitto 2014-02-03 11:14:28 UTC
As part of the RHOS High Availability we need to tune connections to database and qpid.

Some of those options include daemons startup behavior and specifically:

max_retries = -1
retry_interval = 1

to allow the daemon to continue retrying connecting.

Flavio has checked and those options, while part of the example config file, are completely ignored by cinder.

This is a blocker to deploy cinder in HA environments and must be fixed asap.

Comment 1 Eric Harney 2014-02-03 15:07:55 UTC
It appears that RHOS 4 may support this under the [database] section with names max_retries/retry_interval or sql_max_retries/sql_retry_interval, but I haven't tried this yet.

http://git.openstack.org/cgit/openstack/cinder/tree/cinder/openstack/common/db/sqlalchemy/session.py?h=stable/havana#n301

http://git.openstack.org/cgit/openstack/cinder/tree/cinder/openstack/common/db/sqlalchemy/session.py?h=stable/havana#n332


Seems to be a similar issue as bug 1060685, Havana Cinder's sample conf generation isn't accurate.

Comment 2 Fabio Massimo Di Nitto 2014-02-03 15:11:20 UTC
Flavio did the debug and those options appears not to be read at all.

Comment 3 Flavio Percoco 2014-02-03 15:43:58 UTC
I doubled check in your box and it seems like they are indeed filed under the database section. Somehow, I missed that when I was debugging it this morning.

./openstack/common/db/sqlalchemy/session.py:    cfg.IntOpt('max_retries',
./openstack/common/db/sqlalchemy/session.py:               deprecated_name='sql_max_retries',
./openstack/common/db/sqlalchemy/session.py:        remaining = CONF.database.max_retries

Could you please retry by setting those values under the database section?

Comment 4 Fabio Massimo Di Nitto 2014-02-03 15:52:57 UTC
[database]
max_retries = -1
retry_interval = 1

mysql down:

==> scheduler.log <==
2014-02-03 16:51:50.956 16184 CRITICAL cinder [-] (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None

so still doesn´t work :)

Comment 5 Flavio Percoco 2014-02-03 18:37:29 UTC
Erm, I think the issue is that 2013 is not part of the listed codes here[0]

[0] http://git.openstack.org/cgit/openstack/cinder/tree/cinder/openstack/common/db/sqlalchemy/session.py?h=stable/havana#n548

Comment 12 Giulio Fidente 2014-04-17 13:31:08 UTC
verified using openstack-cinder-2013.2.3-1.el6ost.noarch

Comment 14 errata-xmlrpc 2014-05-29 19:57:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0577.html