Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1060687

Summary:	DB connection error code 2013 not handled
Product:	Red Hat OpenStack	Reporter:	Fabio Massimo Di Nitto <fdinitto>
Component:	openstack-cinder	Assignee:	Flavio Percoco <fpercoco>
Status:	CLOSED ERRATA	QA Contact:	Dafna Ron <dron>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.0	CC:	dnavale, eharney, fdinitto, fpercoco, gfidente, yeylon
Target Milestone:	z4	Keywords:	ZStream
Target Release:	4.0	Flags:	pm-rhel: internal-review+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-cinder-2013.2.3-1.el6ost	Doc Type:	Bug Fix
Doc Text:	Previously, due to an un-handled connection error status code, the reconnect operation for Block Storage service was not triggered. This resulted in an issue from the High-Availability stand point as the nodes would fail to reconnect once the database was back up. As a fix, this status code is added to the list of connection error status codes in the database library.	Story Points:	---
Clone Of:
Clones:	1060771 1060783 (view as bug list)		Environment:
Last Closed:	2014-05-29 19:57:27 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1060771, 1060783

Description Fabio Massimo Di Nitto 2014-02-03 11:14:28 UTC

As part of the RHOS High Availability we need to tune connections to database and qpid.

Some of those options include daemons startup behavior and specifically:

max_retries = -1
retry_interval = 1

to allow the daemon to continue retrying connecting.

Flavio has checked and those options, while part of the example config file, are completely ignored by cinder.

This is a blocker to deploy cinder in HA environments and must be fixed asap.

Comment 1 Eric Harney 2014-02-03 15:07:55 UTC

It appears that RHOS 4 may support this under the [database] section with names max_retries/retry_interval or sql_max_retries/sql_retry_interval, but I haven't tried this yet.

http://git.openstack.org/cgit/openstack/cinder/tree/cinder/openstack/common/db/sqlalchemy/session.py?h=stable/havana#n301

http://git.openstack.org/cgit/openstack/cinder/tree/cinder/openstack/common/db/sqlalchemy/session.py?h=stable/havana#n332


Seems to be a similar issue as bug 1060685, Havana Cinder's sample conf generation isn't accurate.

Comment 2 Fabio Massimo Di Nitto 2014-02-03 15:11:20 UTC

Flavio did the debug and those options appears not to be read at all.

Comment 3 Flavio Percoco 2014-02-03 15:43:58 UTC

I doubled check in your box and it seems like they are indeed filed under the database section. Somehow, I missed that when I was debugging it this morning.

./openstack/common/db/sqlalchemy/session.py:    cfg.IntOpt('max_retries',
./openstack/common/db/sqlalchemy/session.py:               deprecated_name='sql_max_retries',
./openstack/common/db/sqlalchemy/session.py:        remaining = CONF.database.max_retries

Could you please retry by setting those values under the database section?

Comment 4 Fabio Massimo Di Nitto 2014-02-03 15:52:57 UTC

[database]
max_retries = -1
retry_interval = 1

mysql down:

==> scheduler.log <==
2014-02-03 16:51:50.956 16184 CRITICAL cinder [-] (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None

so still doesn´t work :)

Comment 5 Flavio Percoco 2014-02-03 18:37:29 UTC

Erm, I think the issue is that 2013 is not part of the listed codes here[0]

[0] http://git.openstack.org/cgit/openstack/cinder/tree/cinder/openstack/common/db/sqlalchemy/session.py?h=stable/havana#n548

Comment 12 Giulio Fidente 2014-04-17 13:31:08 UTC

verified using openstack-cinder-2013.2.3-1.el6ost.noarch

Comment 14 errata-xmlrpc 2014-05-29 19:57:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0577.html